Three Weeks, Three Conferences, One Clear Message About Observability Costs

Summer Lambert
December 17, 2025
Animated GIF from The Simpsons showing a group of cartoon characters seated in a circle on folding chairs in what appears to be a support group meeting, with a "12 Steps" poster on the wall. A standing character addresses the group. Caption reads: "All chatter. I've been there."

We attended KubeCon, AWS re:Invent, and Gartner's IT Infrastructure conference over the past few weeks. The same cost concerns came up at all three.

At KubeCon's 10th anniversary, the OpenTelemetry sessions had standing room only. Teams are standardizing collection methods because vendor pricing varies wildly across hosts, metrics, logs, and traces. Pipeline inefficiencies now translate directly to budget overruns.

The Ingress NGINX retirement (used in nearly half of all Kubernetes clusters) added migration planning to already stretched budgets. (O'Reilly KubeCon Recap)

AWS Focused on Agentic AI and Cost Reality

Re:Invent centered on agentic AI. AWS unveiled Graviton5 processors, Trainium3 UltraServers, frontier agents, and the Nova model family. All powerful technology, but the subtext remained constant: AI infrastructure demands observability, and that observability must scale economically.

One observation from the week captured the tension perfectly: "Nobody wants to be Datadog's number one customer." Being the top spender on any observability platform signals a data strategy problem, not a success metric.

AWS positioned CloudWatch updates around predictable costs and integrated data management rather than feature sprawl. The message: as AI infrastructure scales, observability needs to scale economically alongside it.

Gartner Addressed Budget Justification

The Gartner conference drew 4,000 attendees focused on strategic decisions and budget justification. Paul Delory and Autumn Stanish's opening keynote highlighted that 52% of CIOs list cost-cutting as their top 2026 priority, with AI positioned as the enabler.

Nathan Hill's presentation identified misalignment between infrastructure heads and CIOs as the primary I&O challenge. For observability, this disconnect typically results in collecting data that serves no business purpose while missing signals stakeholders actually need.

Data Volumes vs. Data Value

Teams collect far more data than they analyze. Secoda's research found 80% of log data provides no analytical value, yet companies pay to ingest, store, and query it anyway. (Source)

Here's what that looks like: a traffic spike hits, your Kubernetes cluster autoscales from 50 to 300 pods. Each new pod starts logging. Ten minutes later the spike ends, pods terminate, but you've already paid to ingest and index logs from 250 containers that no longer exist. Repeat daily.

Storage costs climb. Teams add more tools to manage costs. You end up running 10+ monitoring platforms. Vendors charge by volume. Your volumes grow exponentially.

Your Team Goals for 2026

Stop collecting everything by default. OpenTelemetry standardizes how you collect data, but you still decide what to keep. Intelligent sampling, storage tiering, and selective retention cut costs by 60-80% according to multiple vendors at the conferences. (Source)

Justify your observability spend. 54% of IT decision-makers now face pressure from leadership to explain observability costs. Source) Only 17% view it as a growth investment; the remaining 70% focus on optimization.

Focus on signals that matter. Platform teams that enforce data collection standards save money while improving incident response. Keep high-fidelity telemetry for critical paths. Sample aggressively everywhere else.

The Path Forward

The "collect everything" approach worked when data was cheap and budgets were unlimited. Both conditions have ended. Observability pipelines, intelligent sampling, and adaptive telemetry are table stakes now.

Your observability strategy needs to answer: 

  • What data actually improves incident response time? 
  • What can you safely sample or discard? 
  • Where do you need full-fidelity traces versus aggregated metrics?

Build these decisions into your collection layer from the start. Retrofitting cost controls after you're already paying for petabytes of low-value logs doesn't work.

Ready to optimize observability and reduce your costs while improving visibility? Grab a free demo to get your team started with observability that prioritizes signal over noise.

Share this post

More blog posts

All blog posts
Grepr team members John and Utkarsh at an outdoor café in Amsterdam during KubeCon EU 2026.
Events

KubeCon Amsterdam 2026: Hallway Conversations Said What Keynotes Didn't

KubeCon EU 2026 made one thing clear: AI infrastructure is generating telemetry volumes that most observability budgets were never built to handle.
April 8, 2026
Close-up of a hand playing the classic board game Operation, reaching to remove a piece from the patient's body on the yellow game board.
Engineering Guides

How to Drop Noisy Health Check Logs Before They Hit Your Observability Platform

Healthcheck logs generate millions of identical lines per day and silently inflate your observability bill, but filtering them at the right layer can cut total log volume by 15 to 40 percent.
April 7, 2026
Battle image of Grepr and New Relic, with a lightning bolt in between
Comparisons

New Relic Pipeline Control vs Grepr: Manual Rules vs AI Automation

New Relic Pipeline Control bills you on data volume before any filtering happens, requires manual YAML config for every pipeline, and needs a separate Kubernetes install per environment.
April 2, 2026

Get started free and see Grepr in action in 20 minutes.