All Observability Data Is Equal But Some Is More Equal Than Others

Steve Waterworth
June 24, 2025

Observability vendors like to tell us that all data is important all the time and that if you do not collect absolutely everything in microscopic detail you’ll regret it. They promote FOMO (fear of missing out) preaching that the piece of data you chose not to collect is exactly the item you need to solve an issue and there’s going back in time to get it; the horse has bolted and you missed the chance to close the stable door. Of course they often fail to point out that they charge by data volume; the more you send, the better it is for their shareholders.

Different Levels Of Equality

Observability data falls into one of two categories: heartbeat or heart attack. Some data is required all the time to provide heartbeat confirmation, with the absence of this data being a cause for alarm. For example the access logs for a web server, it’s comforting to see requests passing through with 2XX status codes. This gives us a warm feeling that the website is up and serving requests in a prompt and error free manner.

Should the web server access log fall silent, this is a cause for concern. Either our website has suddenly become immensely unpopular or something broke. It is under these heart attack circumstances that the rest of the observability data suddenly becomes handy.

Every Heartbeat

It’s agreed that heartbeat observability data is essential to ensure service continuity by providing the data for alerting, is every log line really necessary, can we skip a heartbeat or two? There’s that FOMO feeling again. All that is really required to know all is well is that the number of 2XX status log entries is steadily increasing. However, if those log entries are being used to create a metric then skipping a beat or two would have consequences on metric precision.

Step Back In Time

When attempting to resolve an issue, every morsel of observability data is indispensable to assist in determining the root cause. This is where FOMO really kicks in and Murphy’s Law comes into force. It is surely a complete certainty that the data you chose not to collect is exactly the data you need. If only it was possible to step back in time and collect the missing data, then the root cause would be obvious and service level objectives would be promptly restored.

Heartbeat Summaries

Grepr’s AI powered data pipeline analyses the semantics of observability data in real time. It dynamically manages a set of transformation rules that reduce data volume by 90%, consequently reducing the observability platform costs by a similar amount. Taking web server access logs as an example, frequent messages are summarised while sparse messages are passed straight through.

As well as summarizing, the dynamic transformation rules also add supplementary metadata. The extra metadata is used to ensure total accuracy of any derived metrics.

The grepr.repeatCount field is used as the basis for metric generation; this will be 1 for messages that are not summarized.

Time Travel Is Possible

All data sent to Grepr is retained in low cost storage, no data is dropped. It gets even better, that data is available to query with a language you are already familiar with: Splunk, Datadog, New Relic and more coming soon. Using the Grepr web dashboard the entire dataset is available to assist in identifying the root cause of issues. The results of a query can optionally be submitted as a backfill job making the data quickly available on your observability platform. It’s like stepping back in time and turning up the detail just before the issue occurred. Continue to use the same workflows in the tool you are already familiar with; zero disruption and no learning curve. Backfill jobs can also be triggered automatically via web hook and query matches.

Don’t Miss Out

With Grepr you no longer have the fear of missing out, you can collect it all and not fear the cost implications. Heartbeat data is summarized and heart attack response data is immediately available via backfill. With Grepr you achieve 100% insight with 10% data and contain observability platform expenditure.

Share this post

More blog posts

All blog posts
Announcements

Announcing live edit

In the fast-paced world of data pipelines, making a mistake can have serious consequences. This blog introduces Grepr's new Live Edit feature, which allows you to safely test changes to your production pipelines. By creating a temporary, risk-free clone of your pipeline, you can add new parsers, exceptions, or other modifications and see the results in real time. This ensures you can validate changes and their impact on your data stream before committing, preventing errors and giving you the confidence to maintain your pipelines with ease.
August 14, 2025
Product

Automatic Backfill

Data backfilling is a powerful tool for troubleshooting, but doing it manually can slow you down when you're racing to resolve an issue. This blog explores how to automate the backfill process using the Grepr Intelligent Observability Data Engine. By configuring webhooks with popular monitoring tools like Splunk, Datadog, and New Relic, or by using Grepr’s built-in rule engine, you can automatically trigger a backfill job when an alert is fired. This provides a complete, unabridged dataset for the time period of an incident, giving you the full context you need to debug without manually running queries—saving you time and making your workflows more efficient.
August 12, 2025
Product

Why We Call Grepr A “Data Engine”

Grepr is an intelligent observability data engine that uses pipelines to process log data from sources like Splunk, Datadog, and New Relic. It stores data in low-cost S3 buckets, extracts key information into a standard format, and then uses a series of advanced processing steps like masking, tokenizing, and machine learning-based clustering to reduce the volume of logs by up to 90%. Users can tune the engine's performance with a variety of settings, including a configurable aggregation time window and a logarithmic sampling strategy, to ensure that important troubleshooting information is preserved while noisy, repetitive logs are filtered out.
August 7, 2025

Get started free and see Grepr in action in 20 minutes.