Improve Your Producers Efficiency With This Powerful Calculator
Case studies
4
min read

Kafka Compression Isn’t the End—We Squeezed 50% More Out

We managed to squeeze an already compressed payload by 50% more. Here's how -
Idan Asulin

At Superstream, we like squeezing every drop of efficiency from data infrastructure. Recently, we tackled an already “optimized” Kafka deployment—and managed to shrink its network footprint by an additional 50%.

Even better? In some cases, we hit 97% reduction when no compression was previously active.

Here’s how we did it—and why the standard approach to Kafka compression leaves so much on the table.


🧠 The Insight: Compression Isn't One-Size-Fits-All

Most teams use Kafka compression by applying a single compression setting (like lz4 or snappy) across all clients. It’s quick. It works. But it leaves a huge optimization gap.

We realized that every application and workload type interacts differently with compression algorithms.

Some patterns benefit from Zstd’s higher ratio, others from Snappy’s speed.

The problem? No one has the time—or capability—to micro-optimize compression settings per workload.

🔍 Our Approach: Workload-Aware Optimization

We flipped the problem around.

Instead of asking developers to tweak every producer, we:

  1. Analyzed workload behavior directly from the Kafka broker side
    We observed how data flows in and out—its shape, size, frequency, entropy, and access patterns.
  2. Inferred the most efficient client-side properties
    Based on our analysis, we matched each workload to the optimal:
    • Compression algorithm
    • Buffer size
    • Batch configurations
    • Other tunables that affect transmission efficiency
  3. Injected optimized settings via a lightweight package
    We created a small client-side module that overrides default Kafka producer properties dynamically, without requiring manual developer changes or coordination.

Let's zoom in for a second on batch.size and linger.ms producer settings. By increasing batch.size, we allow the producer to group more records into a single batch before sending it to the broker.

This not only amortizes compression overhead across more data, leading to significantly better compression ratios, but also reduces the number of requests per second, easing the load on brokers.

You might using a serverless version, but that is still something the vendor counts and charge you for, either directly or indirectly. Paired with a higher linger.ms—which introduces a small delay to allow more records to accumulate before sending—the producer can form larger, more efficient batches, especially under moderate traffic.

While this introduces a slight increase in latency (usually in the range of a few milliseconds), the payoff in network efficiency, CPU usage, and overall throughput is more than worth it for most workloads.

In our benchmarks, these settings were foundational to achieving the 50%+ footprint reduction.

🚀 The Results: Less Data, Less Load, More Speed

After deploying our optimization across diverse workloads, the numbers speak for themselves:

  • 📉 50%+ reduction in data footprint, even on already-compressed streams
  • 💡 60% reduction in broker resource consumption (I/O, memory, CPU)
  • 🔥 Up to 97% reduction for workloads previously uncompressed

And this all happens transparently—no refactoring, no downtime.

Some topic "Bytes In" chart

⚙️ Why This Matters

This isn't just about Kafka. It's a shift in how we think about compression and network optimization.

  • Workloads are dynamic. Compression should be too.
  • Brokers see the full picture. Leverage that visibility.
  • Optimizing transport = saving money + speeding systems.

We believe this pattern—observability-driven, auto-tuned optimization at the infrastructure layer—is the future of high-performance data systems.

The Fastest Way To Optimize
Your Apache Kafka Costs
Recover at least 43% of your Kafka costs
and save over 20 hours of Ops time weekly.
Start free now
Free
$0
/ Savings report
Growth (Most Popular)
$10
/ TB in / Optimized Cluster / Month
  • Starts at 50TB
  • Annual plans only
Enterprise
Custom
  • Private link
  • SSO
  • Annual plans only
* Superstream does not ingest your data — we just use this metric to size your cluster and our value to it.
* Do you purchase through AWS Marketplace? You might be eligible for a discount when subscribing via AWS MP.
Supercharge Productivity

Save Time, Save Money, and Start Today

start Saving Now
Built and Trusted by Data Engineers worldwide