Remove Sensitive Data From Your Logs With the SQL Transform

Jonathan Seidman and Steve Waterworth
December 29, 2025
Grepr vs. Mezmo FAQs Q: Does Mezmo replace my existing observability platform? A: It sits alongside it, which is part of the friction. Mezmo provides its own AI chat interface for querying observability data, but your existing platform (Datadog, New Relic, Splunk, etc.) stays in place too. Engineers end up with two places to look for answers, and reconciling those takes time to sort out in practice. Q: How much data volume reduction can Mezmo deliver? A: Up to 50%, with ongoing manual configuration of pipeline filters. That number depends on how much time your team invests in building and maintaining those rules. As services change, so does the maintenance burden. Q: Will Grepr disrupt how my engineers currently work? A: No changes to existing workflows are required. Grepr reconfigures the existing agents to route through it, then handles everything automatically. Engineers keep using the same dashboards, the same alerting rules, and the same query syntax they already know. Q: What's the difference between Mezmo's pipeline and Grepr's pipeline? A: Mezmo's pipeline configuration is manual end-to-end: sources, sinks, filters, all of it. A misconfiguration can actually increase your data volume. Grepr sets up the source, sink, and data store once, then the AI continuously manages a working set of semantic pattern filters on its own, typically around 200,000 rules for high-volume environments. Q: How does backfill work in Grepr compared to Mezmo? A: In Mezmo, a rehydration job is submitted manually through the web dashboard, and it pulls everything from the selected time window whether you need it or not. Grepr lets you query retained data using Datadog, New Relic, or Splunk syntax, validate it, and submit a targeted backfill. More commonly, the backfill fires automatically when an observability alert triggers a webhook.

The Grepr Intelligent Observability Data Engine lets you cut observability costs by reducing the volume of log data sent to your monitoring platform, while giving you full control over how that data is shaped. With Grepr’s flexible log processing pipelines, you can use rich tools to process, transform, and enrich raw log events, turning noisy streams into structured, searchable information. This functionality enables you to improve searchability, build reliable datasets for analysis and reporting, and tailor events to meet specific operational and business requirements.

One of the tools available in a Grepr pipeline is the SQL transform, which lets you use the familiar SQL syntax to perform complex transformations on log events. You can include the SQL transform in the pre-parsing, post-parsing, and post-warehouse filter steps of a pipeline.

The SQL Transform In Action: Redacting Sensitive Information From Events

While personally Identifiable Information (PII), such as credit card numbers, passwords, or user names, should never be written in plain text to logs, coding errors or other issues can lead to PII sometimes finding its way into logs. In this example, a developer, while testing code changes, added DEBUG messages that log the users plain-text password. Unfortunately, an error caused the deployment of a development configuration to one of the production hosts, resulting in the logging of DEBUG messages on that host, including the passwords.

2024-12-26 10:08:41 INFO - Successful login for user 'ivan' from IP 10.0.0.88

2024-12-26 10:09:56 DEBUG - Successful login for user 'julia' password '*9vpkZQq_e' from IP 10.0.2.15

2024-12-26 10:10:14 WARN - Failed login attempt for user 'kevin' from IP 10.0.4.92 account not found

2024-12-26 10:11:28 ERROR - Account 'laura' locked due to multiple failed attempts from IP 10.0.1.56

2024-12-26 10:05:18 DEBUG - Successful login for user 'fiona' password '3aK-DyK_9R' from IP 10.0.0.19

A way to address this issue needs to be implemented quickly while a code fix is in progress. Fortunately, the SQL transform can be used in a Grepr pipeline to remove this sensitive data. To ensure this data is redacted before events are stored or forwarded, the redaction must occur before the data lake step in the pipeline. In this example, we’ll use the post-parsing filter to configure the transform.

To add the SQL transform to a pipeline, in the Grepr UI, go to the details page for your pipeline. Select Post-parsing filter from the left-hand menu, and click the pencil icon to display the filter configuration form.

The pipelines details page showing how to edit a filter.

Only log events from the login service need redaction, so we can reduce the processing load by passing only those events to the SQL transform. All other events should be passed through to the next pipeline step. 

To configure the flow of events based on the service, we select Enable data passthrough to next step and add the filter query -service:login. This filter query passes only events from the login service to the SQL transform. All other events are passed to the next step in the pipeline.

Then, selecting Enable SQL processing and Process only data not passed through to next step enables the SQL transform flow.

The filter configuration form with settings to pass specific events to the next pipeline step, and process all other events using SQL.

We then click the plus sign under SQL Views, enter a name for the view that will contain our processed events, and enter the following query in the SQL field. This query uses a CASE expression to do the following: if the log event contains the text password, a regular expression overwrites the password with a series of ‘*’s. Otherwise, the log event is unchanged.

SELECT

  CASE

    WHEN message LIKE '%password %'

    THEN REGEXP_REPLACE(message, 'password ''[^'']+''', 'password ''*****''')

    ELSE message

  END AS message,

  *

FROM logs

This query ensures that the plain-text passwords are masked before saving to the data lake and forwarding to an observabiity platform:

2024-12-26 10:09:56 DEBUG - Successful login for user 'julia' password '*****' from IP 10.0.2.15

Clicking the plus sign in the Outputs section, entering the name of the view containing processed events, and selecting Data Lake configures the SQL transform to pass the events to the next step in the pipeline.

When we look at the log messages from the login service forwarded to Datadog, we can see that the PII is removed while the other messages are forwarded unchanged.

To learn more, see Transform events with the SQL operation in the Grepr documentation.

Share this post

More blog posts

All blog posts
Person typing at a vintage computer with a green-text terminal screen in a dark room
Engineering Guides

APM Traces vs. Application Logs: What's the Difference and Why It Matters

Application logs capture developer-written context about business logic and internal state; APM traces automatically record request flow and performance across services, and understanding the difference explains why both inflate your observability bill.
March 26, 2026
It's Always Sunny in Philadelphia Wall of Conspiracy
Signals

HIPAA Requirements for Observability Data Retention: What Engineering Teams Need to Know in 2026

Most engineering teams running healthcare or FinTech systems are sitting on a HIPAA retention gap they have not found yet, and their observability vendor is not going to flag it for them.
March 24, 2026
"Grepr, monitor my environment" text over a neutral background
Announcements

Announcing the first proactive AI SRE agent

Grepr's proactive AI SRE agent monitors system behavior in real time, catching anomalies before they become outages, without waiting for a threshold to be crossed or a rule to be written.
March 19, 2026

Get started free and see Grepr in action in 20 minutes.