Stop Copy-Pasting Producer Settings Across Kafka Source Connectors
Real numbers for batch.size
, linger.ms
, and compression.type
—and a method you can reproduce
When I first operationalized Kafka Source Connectors, I gave them all the same producer settings because… “consistency.” It worked, but it was inefficient. Once I tuned per workload, I cut request rates by 4–7× and shaved double-digit % off broker/network costs.
This post shows:
- Concrete, reproducible numbers for MySQL CDC, MongoDB CDC, and an S3 Source workload.
- What “analyze your workload” actually means.
- A simple decision procedure to pick
batch.size
,linger.ms
, andcompression.type
that you can apply to any connector.
Note on config names: Per-connector overrides go in the connector config as producer.override.*
(e.g., producer.override.linger.ms
). Worker-wide defaults live in the Connect worker config as producer.*
. I’m showing per-connector here.
TL;DR recommendations (then I justify them)

Defaults (batch.size=16 KB
, linger.ms=0
, compression=none
) are rarely good:
- With small records, 16 KB caps you at single-digit records per request; 0 ms linger prevents batch fill.
- With big records,
batch.size
< record size forces 1 record per request anyway; you lose batching benefits.
What I measured (and why the values make sense)
Cluster: Kafka 3.7, 3× brokers (m5.2xlarge-ish), RF=3, 1 Gbps network
Runs: 10-minute steady state at ~70–80% of connector max throughput
Metrics: Producer JMX (records-per-request-avg
, batch-size-avg
, compression-rate-avg
, request-rate
, record-send-rate
, bufferpool-wait-time-total
), broker ingress bytes/s, p50/p99 end-to-end latency at the topic.
1) MySQL CDC (Debezium) — small JSON records
- Workload stats: mean 1.4 KB, p95 3.2 KB, highly compressible (zstd ≈ 0.28 ratio on samples).
- Baseline (defaults)
batch.size=16 KB, linger.ms=0, compression=none
→records-per-request-avg ≈ 4–7
(can’t fill more),request-rate
high, broker ingress ~25 MB/s. - Optimized
batch.size=128 KB, linger.ms=15 ms, compression=zstd
→records-per-request-avg ≈ 60–90
,request-rate ↓ ~70%
, broker ingress ~44–50 MB/s (same source rate, fewer headers/overheads), p50 latency +10–15 ms, p99 still < 150 ms. - Why 128 KB? With 1.4 KB records, target ~70–90 rec/batch needs ~100–125 KB uncompressed buffer. 15 ms linger is enough to fill at typical CDC rates without blowing out tail latency. zstd wins on JSON.
2) MongoDB CDC (Debezium) — larger JSON docs
- Workload stats: mean 3.8 KB, p95 12 KB, compressible (zstd ≈ 0.35 ratio).
- Baseline (defaults)
→records-per-request-avg ≈ 2–4
, visible CPU spikes on brokers due to many small appends. - Optimized
batch.size=256 KB, linger.ms=15 ms, compression=zstd
→records-per-request-avg ≈ 45–65
,request-rate ↓ ~60–65%
, throughput +~2× vs baseline, p50 +~12 ms, p99 < 200 ms. - Why 256 KB? p95 is 12 KB; aiming ~50 rec/batch → ~600 KB uncompressed would be overkill. But batches saturate earlier due to partitions and in-flight limits; empirically 256 KB gets you most of the win without memory pressure.
3) S3 Source — NDJSON ~150 KB per record
- Workload stats: mean 150 KB, p95 400 KB, mixed compressibility (lz4 ≈ 0.8, zstd ≈ 0.6, but zstd CPU ~2–3× lz4 for marginal wins here).
- Baseline (defaults)
→ Becausebatch.size(16 KB) < record size
, you effectively send 1 record per request;request-rate
is needlessly high; network spends too much time on per-request overhead. - Optimized
batch.size=1 MB, linger.ms=10 ms, compression=lz4
→records-per-request-avg ≈ 5–7
(p95 still sends alone),request-rate ↓ ~4–6×
, broker ingress +~30–40% vs baseline due to fewer headers and better IO patterns. Latency change negligible (big records dominate). - Why lz4? At these sizes, CPU becomes limiting before network; lz4 yields practical wins with minimal CPU. If your instances are CPU-rich and cost of egress matters, zstd can still be worth it.
Guardrails: For big records, ensure max.request.size
and topic max.message.bytes
comfortably exceed your p95 (plus headers). For tiny records, ensure buffer.memory
is adequate if you raise linger/batch.
What “analyze your workload” actually means
Don’t guess. Measure these five things first:
- Record size distribution (uncompressed)
- Collect a statistically meaningful sample (≥100k records).
- Compute mean, p50, p95, p99 of serialized record size in bytes.
- This predicts feasible records per batch given a
batch.size
.
- Compressibility
- On your sample, test
zstd -1
,lz4
, and (optionally)gzip
. - Capture compression ratio (compressed/uncompressed) and CPU time on your target instance class.
- Prefer the codec with the best bytes-saved per CPU-second for your payload.
- On your sample, test
- Steady-state record rate
- Measure records/sec per partition and overall.
- This plus your target records per batch gives a linger budget:
linger.ms ≈ 1000 * target_records_per_batch / steady_records_per_sec_per_partition
- Cap linger to your end-to-end latency SLO.
- Producer fill metrics
- Watch
records-per-request-avg
,batch-size-avg
(JMX). - If
batch-size-avg
sits < 40–60% of configuredbatch.size
, you’re not filling—reducebatch.size
or increaselinger.ms
.
- Watch
- Backpressure / memory
- Check
bufferpool-wait-time-total
and rejected sends. - If you see waits, lower linger/batch or raise
buffer.memory
. - Keep
delivery.timeout.ms
sane when increasing linger.
- Check
A simple decision procedure
- Pick a target records-per-request
- Small text records: 50–100
- Medium JSON (MongoDB): 40–70
- Large NDJSON (S3): 3–8
- Set
batch.size
(bytes)batch.size ≈ target_records_per_request * p50_record_size
- Keep ≤ ~512 KB for small/medium; up to 1 MB for large payloads.
- If
batch-size-avg
is < 50% in prod, trim by ~25%.
- Set
linger.ms
- Start with the formula above; clamp to 10–20 ms for small/medium, 5–15 ms for large.
- Validate p99 latency.
- Choose
compression.type
- zstd for JSON/text unless CPU is the bottleneck.
- lz4 for large or mixed/binary where CPU cost dominates.
- Avoid
none
unless payloads are already compressed (e.g., gzip’d blobs).
- Re-measure
- Aim for:
records-per-request-avg
within ±20% of targetbatch-size-avg
at 50–80% of configuredbatch.size
request-rate
drop ≥ 2× vs baseline- p99 latency within SLO
- Aim for:
Concrete configs I’d ship (starting points)
MySQL CDC (Debezium)
{
"producer.override.batch.size": 131072, // 128 KB
"producer.override.linger.ms": 15,
"producer.override.compression.type": "zstd",
"producer.override.buffer.memory": 67108864 // 64 MB (optional headroom)
}
MongoDB CDC (Debezium)
{
"producer.override.batch.size": 262144, // 256 KB
"producer.override.linger.ms": 15,
"producer.override.compression.type": "zstd"
}
S3 Source (NDJSON ~150 KB/rec)
{
"producer.override.batch.size": 1048576, // 1 MB
"producer.override.linger.ms": 10,
"producer.override.compression.type": "lz4",
"producer.override.max.request.size": 2097152, // 2 MB safety margin
"producer.override.buffer.memory": 134217728 // 128 MB for large records
}
Tune topic max.message.bytes
accordingly for very large records/batches.
What the optimizations bought me
Across the three workloads:
- Request rate: ↓ 60–85% (fewer, fuller requests)
- Broker ingress efficiency: ↑ 30–100% (less header/overhead per byte)
- Throughput at same CPU: ↑ 1.3–2.3× (workload-dependent)
- Latency impact: p50 +5–20 ms; p99 remained within typical CDC tolerances
These aren’t theoretical. They follow directly from larger, compressed batches and a linger long enough to fill them—bounded by your latency SLOs.
How to reproduce (fast)
- Sample 100k records from each connector’s output topic.
- Compute size histogram; get p50/p95.
- Compress the sample with zstd/lz4; note ratio + CPU time.
- Apply the decision procedure above; deploy per-connector overrides.
- Watch
records-per-request-avg
,batch-size-avg
,request-rate
, and p99 latency for 10–15 minutes under steady load. Iterate.
Automate this with Superstream SuperClient
If you’d rather not build the profiling-and-tuning loop yourself, Superstream SuperClient automates exactly what this post outlines.
- It continuously profiles real producer behavior (record-size distribution, batching efficiency, compression effectiveness, request/record rates per partition).
- Computes the optimal
batch.size
,linger.ms
, andcompression.type
per connector/topic. - Either surfaces a reviewed change set or safely applies it by overwriting existing client settings (via interceptor/sidecar) under guardrails—p99 latency budgets, canary rollout, and instant rollback.
You get the same outcomes as the experiments above—fewer requests, lower bandwidth, and higher throughput—plus a report showing records-per-request-avg
, batch-size-avg
fill %, compression ratios, request-rate deltas, and projected egress savings so the gains are auditable.
Closing
Don’t ship one set of producer settings across all source connectors. Match the knobs to the payload:
batch.size
≈ how many bytes you want per request.linger.ms
≈ how long you can wait to fill that batch.compression.type
≈ bytes-saved per CPU for your data.
If you do only one thing after reading this, go look at records-per-request-avg
in prod. If it’s single digits for small/medium JSON, you’re burning requests—and money—for no value.