Logging everything made sense until it didn't.
Your software is writing a log right now. Every request, every failure, every query that takes longer than it should gets captured somewhere, and for most engineering teams, that somewhere has become a significant and surprisingly fast-growing cost. App logging, the practice of recording events, errors, and state changes as software runs, is not new. What is new is the bill it generates at scale.
What App Logging Actually Does
Something happens in your application, a request comes in, a service connection fails, a query runs slowly, and a log captures it as a timestamped record. Most environments generate several types: application logs, event logs for system actions, and audit logs for user activity, all of which flow into the same downstream platforms and have their own volume and retention requirements. Format matters more than teams expect: unstructured logs are plain text, which works at low volume but is not queryable at scale. Structured logs encode the same information as key-value pairs or JSON, making it readable by tooling.
You cannot debug what you did not capture, and nobody knows in advance which log line will matter during an incident. That is why the instinct to log everything exists. It is also, without any downstream filtering logic, where the cost problem starts.
Why App Logging Costs Scale Faster Than Your User Base
The volume problem is not what most teams expect. User growth barely moves the needle. What moves it is complexity: more services, more instrumentation, third-party integrations emitting events nobody asked for and nobody turned off. It all ships to your observability platform without anyone reviewing it first, and Datadog, Splunk, and New Relic charge based on how much arrives, with retention sometimes adding another layer of cost on top.
Three sources tend to account for most of the bloat: Debug-level noise or other types of verbose output left enabled after deployment, health check pings from load balancers, or a single misconfigured service throwing the same error thousands of times a minute. Duplicate events are subtler: the same transaction logged by the API gateway, the service mesh, and the application layer, each unaware that the others are writing the same thing. High-cardinality fields, things like unique request IDs embedded in every log line, inflate storage costs even when the actual signal inside those logs is not growing.
Where App Logging Costs Hide Inside Your Observability Stack
When teams add distributed tracing without revisiting their logging strategy, the same operational context often ends up captured in both logs and traces. Nobody planned for it, it just happened as instrumentation expanded. Observability platforms have made it easy to add coverage; auditing what you already have and deciding what to stop paying for is a different problem, and it almost never happens automatically.
App Logging Mistakes That Drive Up Observability Bills
Logging at DEBUG in production. A single busy service can generate 50x more log volume at DEBUG than at INFO, and it usually takes an unexpected bill to surface that a debug flag was left on after deployment.
Retaining logs longer than your incident response window requires. If your team realistically looks back 14 days during an incident, storing 90 days of verbose application logs is storage you are paying for and will not use.
Logging full request and response bodies by default. Useful for debugging specific issues in a controlled context. As a standing default at meaningful traffic volume, it creates both a cost problem and a compliance liability.
Ignoring log volume from service dependencies. Third-party integrations, managed services, and vendor SDKs often emit verbose logs by default. Teams tend to discover this on their observability bill, not in a code review.
Sending everything to one destination. Routing all logs to a single, expensive, fast-query backend regardless of whether those logs will ever be queried interactively is the operational equivalent of storing your tax documents and junk mail in the same safe deposit box.
How to Reduce App Logging Costs Without Losing Observability Coverage
Setting appropriate log levels and dropping redundant fields is good advice that stops being sufficient around the time your observability bill starts looking alarming. At volume, cost control requires intervention between your applications and your observability backends.
Filtering at the pipeline layer means dropping health-check noise, debug-level output, and known-useless events before they reach paid ingestion, applied through a centralized pipeline rather than inside individual applications where every team owns their own instrumentation.
Routing by value recognizes that not all logs need the same treatment. Security-relevant audit logs need long retention and fast query access. Verbose service debug output might only need to exist for 24 hours. Routing logs to different backends based on content and criticality cuts costs without reducing coverage.
Deduplication handles the repeated error messages from a single failing pod that do not need 10,000 individual log events. Aggregating them into counts preserves the signal while collapsing the volume.
Sampling works for high-volume, low-priority streams like health checks or routine API calls with no anomalies. Sampling at 1% or 5% captures enough to confirm normal behavior at a fraction of the ingestion cost.
Grepr is built for this layer, sitting between your log sources and your observability backends and applying filter, route, deduplicate, and sample logic before data hits ingestion.
How to Get App Logging Under Control
Start by figuring out what is actually driving your volume, broken down by service, log level, and message type. Most teams find the answer uncomfortable: a small number of sources, often ones nobody has looked at in months, account for most of what they are paying to store. Debug output left on in production can usually be filtered before it hits ingestion. High-volume, low-signal streams like health checks can be sampled. Some sources, verbose SDK output, redundant service mesh events, and full request bodies logged by default should probably not exist in production at all.
Even teams that clean this up are still mostly reacting: waiting for an alert, asking why something broke after it already did. Grepr was built for both problems. Its telemetry reduction engine cuts log volume in real time by surfacing behavioral patterns rather than passing every raw event downstream, and its AI agent proactively monitors your environment, watching for anomalies and business-level misbehavior before a threshold is crossed or a customer notices. Start for free today.
FAQ: App Logging, Log Management, and Observability Costs
What is app logging, and do I actually need it?
Production systems need logging. What varies is how much, at what verbosity, and for how long. Teams running distributed systems need logs to reconstruct what happened during an incident, but they rarely need every log at full fidelity indefinitely. The cost comes from treating all of it as equally necessary, which most default configurations quietly do.
Why is my observability bill so high if I have the same number of users?
User count is rarely what drives log volume. Complexity does. A single new microservice, a noisy third-party integration, or a debug flag left on in production can spike ingestion costs without any change in traffic, which makes it one of the harder cost increases to explain to finance.
What is the difference between app logging and application performance monitoring (APM)?
APM tools typically combine metrics, traces, and logs into a unified view of application behavior. App logging is one component of that. The overlap between log data and trace data is where teams often end up paying for the same operational context twice, which is worth examining if you have both running.
How do I reduce log ingestion costs without breaking incident response?
Pipeline-level filtering and routing, applied before data reaches your observability platform, is the most durable approach. The risk of dropping something important is real but manageable if you work from actual query history rather than assumptions. Grepr handles this upstream of your backends, so the filtering logic stays outside your application code and can be adjusted without a deployment.
What are structured logs, and why do they matter for cost?
Structured logs format log data as key-value pairs or JSON rather than plain text strings, making them queryable by machines. This matters for cost because unstructured logs are harder to filter, deduplicate, and sample effectively at the pipeline layer. Structured logging is a prerequisite for most meaningful cost optimization work.
What log levels should I use in production?
INFO as the baseline, WARNING and ERROR for conditions worth investigating. DEBUG belongs in development and staging. The exception is active troubleshooting, where enabling DEBUG temporarily is fine, but it needs to come back off, and that last step is where things tend to go sideways.
More blog posts
All blog posts
Grepr vs. Observo: Choosing the Right AI-Powered Observability Data Pipeline

Grepr vs. Mezmo: Comparing Observability Pipeline Solutions



