13,500 people at the RAI convention center in late March. The tulips were blooming, bikes were everywhere, and every booth had some version of the same pitch.
AI. Again. Still. More.
NVIDIA joined CNCF as a platinum member, pledged $4 million in GPU access for open-source projects. IBM Research, Red Hat, and Google Cloud donated llm-d, a distributed inference framework for running LLMs on Kubernetes. The message from the main stage was that cloud native infrastructure has become the substrate for AI, not just a place to run it.
What didn't make the keynote slides: all that AI infrastructure is generating staggering volumes of telemetry, and most teams are underprepared for what that actually costs.
Hallway Math
The most useful conversations at KubeCon happened in the hallway track, at Observability Day, and over drinks in the Jordaan. Engineers kept ending up in the same place. They'd start talking about what they were building, inference pipelines, model serving on GPU nodes, agentic workflows, and eventually the conversation would arrive at cost. Not performance cost. Dollar cost. Observability budgets specifically.
Research presented at the conference gave that feeling some numbers: 80% of log data provides zero analytical value, and organizations are paying to ingest, store, and query it regardless. 54% of IT decision-makers are now getting hard questions from leadership about observability spend.
The irony is obvious in retrospect. Teams adopt AI to move faster. The observability overhead on those AI workloads slows them down.
What Observability Day Actually Said
Observability Day on Monday didn't hint at the problem, but named it directly. Session after session returned to the same friction point: telemetry volumes are growing faster than budgets, and "collect everything just in case" just doesn’t fit in anyone’s budget anymore.
One talk posed a question that made the room quiet: should you really be storing traces you might need in five years? The answer is increasingly no, but the path from "collect everything" to "collect what matters" requires rethinking how pipelines are designed from the start.
OpenTelemetry sessions ran standing-room-only, and the conversations weren't just about standardizing collection. They were about what happens after collection, specifically how to filter, aggregate, and route data so that only high-signal information reaches dashboards and alerts. The shift from "more data" to "right data" was the undercurrent of the entire week.
Other Things Worth Noting
Digital sovereignty moved from buzzword to practical architecture. European organizations are taking data residency, regulatory compliance, and infrastructure independence seriously, and the CNCF ecosystem is building real tooling around it.
Platform engineering is maturing. The conversation has moved past "should we build an internal developer platform" to "how do we operate one well," with golden paths, self-service tooling, and developer experience getting serious attention.
AI agents are introducing uncomfortable dynamics for open-source communities. One agent reportedly published a hit piece on a matplotlib maintainer after a PR rejection. It was a story that rippled through the conference and didn't resolve cleanly.
Where This Lands for Grepr
The teams we talked to at our booth aren't trying to change vendors. They've invested in their observability stack, they trust their dashboards, and they're not tearing anything out. What they can't sustain is ingesting volumes of telemetry that nobody looks at, while the signals that actually matter get lost somewhere in the volume. We were amazed and heartened by how many people stopped in their tracks when they saw the banner on our booth, ‘Drowning in Expensive Telemetry Noise?’ This clearly indicates that we are hitting a nerve that must be resolved.
Grepr sits between your telemetry sources and your existing observability platform, is very quick to deploy, and can be deployed on-prem or SaaS.
Our patented ML processes the data in real time, aggregates high-volume patterns, surfaces anomalies, and routes only the data that carries signal.
Raw data stays in your own S3 bucket in open formats (Apache Iceberg and Parquet) and is fully queryable when you need it.
Most teams are running their first pipeline in about 20 minutes. The cost reduction shows up on the next invoice, usually 90-99% volume reduction without losing what your alerts and dashboards depend on. It really is shocking the volume of telemetry that is redundant, but we should not blame the developers for creating so much of it, they have a lot of responsibility to deliver quality!
If you're working through this problem, reach out.Tot ziens! See you in Salt Lake City.
More blog posts
All blog posts
How to Drop Noisy Health Check Logs Before They Hit Your Observability Platform
.png)
New Relic Pipeline Control vs Grepr: Manual Rules vs AI Automation



