Automatic Backfill

Steve Waterworth
August 12, 2025

First a quick recap on what backfill is and why it’s useful when attempting to resolve issues. Grepr Intelligent Observability Data Engine sits between the log shippers and the aggregation and storage backend. Using machine learning it automatically detects similar patterns across verbose log messages. These are then forwarded as message summaries to the backend. Unique low frequency messages are passed straight through. All data sent to Grepr is retained in low cost storage and is accessible at any time via well known query syntaxes: Splunk, Datadog, New Relic. The results of a query may be optionally submitted as a backfill job, populating the backend with all the messages that were originally only sent through as summaries. This provides an unabridged dataset covering the time period of the issue under investigation.

Manual backfill is of course very useful but it would be more convenient and a bit quicker to have it happen automatically when there is an issue. There are a couple ways to achieve this.

Webhook

Monitoring systems may be configured to call a webhook passing event details when an alert is triggered. A webhook can be written to inspect the event details, build a query and submit it to Grepr via its REST API as a backfill job. See Dynamic Callback Exceptions in the documentation.

Datadog

A webhook can be called via a special notation in the message text of the alert.

New Relic

A webhook can be added to an alert workflow.

Splunk

A webhook can be configured as an alert action.

Rule Engine

Grepr has rules based exceptions that trigger when a condition is met, stopping aggregation and performing backfill for a period of time before the event. To configure this go to the Grepr dashboard then drill in to the details for your pipeline. Under Exceptions, click the Add button.

Enter a query to select the trigger condition. In this example when the service “rs-payment” logs a message with an “error” status, it will trigger a backfill for data up to one hour before. The scoping tag of “host” means that only messages from the same host as the trigger will be backfilled. Alternative tags that could be used include: pod-id, container-id, namespace, etc. This ensures that the required insight is available without dumping too much data to the aggregation backend, thus minimising usage costs.

The status of the exception is shown once it has been created.

This shows that the rule has just been triggered and the backfill job has started. Once it completes in a few moments all the data will be available to view in the aggregation tool.

Automatic Convenience

Performing manual ad-hoc backfill queries provides ultimate flexibility when investigating issues. However, fully automating backfill on issue detection saves time and eases workflow. Grepr gives you the option of utilising both approaches for optimal operational efficiency. When using automatically triggered backfill, remember to consider carefully how often they might be triggered. Over triggering backfill jobs will send more data to the aggregation backend thus reducing the potential usage cost savings that Grepr provides.

Share this post

More blog posts

All blog posts
Announcements

Announcing live edit

In the fast-paced world of data pipelines, making a mistake can have serious consequences. This blog introduces Grepr's new Live Edit feature, which allows you to safely test changes to your production pipelines. By creating a temporary, risk-free clone of your pipeline, you can add new parsers, exceptions, or other modifications and see the results in real time. This ensures you can validate changes and their impact on your data stream before committing, preventing errors and giving you the confidence to maintain your pipelines with ease.
August 14, 2025
Product

Why We Call Grepr A “Data Engine”

Grepr is an intelligent observability data engine that uses pipelines to process log data from sources like Splunk, Datadog, and New Relic. It stores data in low-cost S3 buckets, extracts key information into a standard format, and then uses a series of advanced processing steps like masking, tokenizing, and machine learning-based clustering to reduce the volume of logs by up to 90%. Users can tune the engine's performance with a variety of settings, including a configurable aggregation time window and a logarithmic sampling strategy, to ensure that important troubleshooting information is preserved while noisy, repetitive logs are filtered out.
August 7, 2025
Product

How FOSSA Reduced Their Logs by 94% Without Burdening Their Engineers

Is your observability bill growing faster than your engineering team can say "log volume"? You're not alone. FOSSA, a leader in software supply chain management, faced a similar challenge. Their reliance on Datadog, while providing essential visibility, was becoming a significant financial burden as their platform scaled. Instead of a painful, time-consuming overhaul of their entire logging strategy, FOSSA found a smarter way. They discovered a solution that allowed them to dramatically reduce their Datadog costs without sacrificing the crucial insights they needed to monitor and troubleshoot their systems. Want to know how FOSSA achieved a whopping 95% reduction in log volume and kept their observability costs in check? Click to read the full story and discover their secret!
July 30, 2025

Get started free and see Grepr in action in 20 minutes.