Stop Copy-Pasting Producer Settings Across Kafka Source Connectors

Posted by :

Idan Asulin

September 13, 2025

Stop Copy-Pasting Producer Settings Across Kafka Source Connectors

Real numbers for `batch.size`, `linger.ms`, and `compression.type`—and a method you can reproduce

When I first operationalized Kafka Source Connectors, I gave them all the same producer settings because… “consistency.” It worked, but it was inefficient. Once I tuned per workload, I cut request rates by 4–7× and shaved double-digit % off broker/network costs.

This post shows:

Concrete, reproducible numbers for MySQL CDC, MongoDB CDC, and an S3 Source workload.
What “analyze your workload” actually means.
A simple decision procedure to pick batch.size, linger.ms, and compression.type that you can apply to any connector.

Note on config names: Per-connector overrides go in the connector config as producer.override.* (e.g., producer.override.linger.ms). Worker-wide defaults live in the Connect worker config as producer.*. I’m showing per-connector here.

TL;DR recommendations (then I justify them)

Defaults (batch.size=16 KB, linger.ms=0, compression=none) are rarely good:

With small records, 16 KB caps you at single-digit records per request; 0 ms linger prevents batch fill.
With big records, batch.size < record size forces 1 record per request anyway; you lose batching benefits.

What I measured (and why the values make sense)

Cluster: Kafka 3.7, 3× brokers (m5.2xlarge-ish), RF=3, 1 Gbps network
Runs: 10-minute steady state at ~70–80% of connector max throughput
Metrics: Producer JMX (records-per-request-avg, batch-size-avg, compression-rate-avg, request-rate, record-send-rate, bufferpool-wait-time-total), broker ingress bytes/s, p50/p99 end-to-end latency at the topic.

1) MySQL CDC (Debezium) — small JSON records

Workload stats: mean 1.4 KB, p95 3.2 KB, highly compressible (zstd ≈ 0.28 ratio on samples).
Baseline (defaults)
batch.size=16 KB, linger.ms=0, compression=none
→ records-per-request-avg ≈ 4–7 (can’t fill more), request-rate high, broker ingress ~25 MB/s.
Optimized
batch.size=128 KB, linger.ms=15 ms, compression=zstd
→ records-per-request-avg ≈ 60–90, request-rate ↓ ~70%, broker ingress ~44–50 MB/s (same source rate, fewer headers/overheads), p50 latency +10–15 ms, p99 still < 150 ms.
Why 128 KB? With 1.4 KB records, target ~70–90 rec/batch needs ~100–125 KB uncompressed buffer. 15 ms linger is enough to fill at typical CDC rates without blowing out tail latency. zstd wins on JSON.

2) MongoDB CDC (Debezium) — larger JSON docs

Workload stats: mean 3.8 KB, p95 12 KB, compressible (zstd ≈ 0.35 ratio).
Baseline (defaults)
→ records-per-request-avg ≈ 2–4, visible CPU spikes on brokers due to many small appends.
Optimized
batch.size=256 KB, linger.ms=15 ms, compression=zstd
→ records-per-request-avg ≈ 45–65, request-rate ↓ ~60–65%, throughput +~2× vs baseline, p50 +~12 ms, p99 < 200 ms.
Why 256 KB? p95 is 12 KB; aiming ~50 rec/batch → ~600 KB uncompressed would be overkill. But batches saturate earlier due to partitions and in-flight limits; empirically 256 KB gets you most of the win without memory pressure.

3) S3 Source — NDJSON ~150 KB per record

Workload stats: mean 150 KB, p95 400 KB, mixed compressibility (lz4 ≈ 0.8, zstd ≈ 0.6, but zstd CPU ~2–3× lz4 for marginal wins here).
Baseline (defaults)
→ Because batch.size(16 KB) < record size, you effectively send 1 record per request; request-rate is needlessly high; network spends too much time on per-request overhead.
Optimized
batch.size=1 MB, linger.ms=10 ms, compression=lz4
→ records-per-request-avg ≈ 5–7 (p95 still sends alone), request-rate ↓ ~4–6×, broker ingress +~30–40% vs baseline due to fewer headers and better IO patterns. Latency change negligible (big records dominate).
Why lz4? At these sizes, CPU becomes limiting before network; lz4 yields practical wins with minimal CPU. If your instances are CPU-rich and cost of egress matters, zstd can still be worth it.

Guardrails: For big records, ensure max.request.size and topic max.message.bytes comfortably exceed your p95 (plus headers). For tiny records, ensure buffer.memory is adequate if you raise linger/batch.

What “analyze your workload” actually means

Don’t guess. Measure these five things first:

Record size distribution (uncompressed)
- Collect a statistically meaningful sample (≥100k records).
- Compute mean, p50, p95, p99 of serialized record size in bytes.
- This predicts feasible records per batch given a batch.size.
Compressibility
- On your sample, test zstd -1, lz4, and (optionally) gzip.
- Capture compression ratio (compressed/uncompressed) and CPU time on your target instance class.
- Prefer the codec with the best bytes-saved per CPU-second for your payload.
Steady-state record rate
- Measure records/sec per partition and overall.
- This plus your target records per batch gives a linger budget:
- linger.ms ≈ 1000 * target_records_per_batch / steady_records_per_sec_per_partition
- Cap linger to your end-to-end latency SLO.
Producer fill metrics
- Watch records-per-request-avg, batch-size-avg (JMX).
- If batch-size-avg sits < 40–60% of configured batch.size, you’re not filling—reduce batch.size or increase linger.ms.
Backpressure / memory
- Check bufferpool-wait-time-total and rejected sends.
- If you see waits, lower linger/batch or raise buffer.memory.
- Keep delivery.timeout.ms sane when increasing linger.

A simple decision procedure

Pick a target records-per-request
- Small text records: 50–100
- Medium JSON (MongoDB): 40–70
- Large NDJSON (S3): 3–8
Set batch.size (bytes)batch.size ≈ target_records_per_request * p50_record_size
- Keep ≤ ~512 KB for small/medium; up to 1 MB for large payloads.
- If batch-size-avg is < 50% in prod, trim by ~25%.
Set linger.ms
- Start with the formula above; clamp to 10–20 ms for small/medium, 5–15 ms for large.
- Validate p99 latency.
Choose compression.type
- zstd for JSON/text unless CPU is the bottleneck.
- lz4 for large or mixed/binary where CPU cost dominates.
- Avoid none unless payloads are already compressed (e.g., gzip’d blobs).
Re-measure
- Aim for:
  - records-per-request-avg within ±20% of target
  - batch-size-avg at 50–80% of configured batch.size
  - request-rate drop ≥ 2× vs baseline
  - p99 latency within SLO

Concrete configs I’d ship (starting points)

MySQL CDC (Debezium)

{ "producer.override.batch.size": 131072, // 128 KB "producer.override.linger.ms": 15, "producer.override.compression.type": "zstd", "producer.override.buffer.memory": 67108864 // 64 MB (optional headroom) }

MongoDB CDC (Debezium)

{ "producer.override.batch.size": 262144, // 256 KB "producer.override.linger.ms": 15, "producer.override.compression.type": "zstd" }

S3 Source (NDJSON ~150 KB/rec)

{ "producer.override.batch.size": 1048576, // 1 MB "producer.override.linger.ms": 10, "producer.override.compression.type": "lz4", "producer.override.max.request.size": 2097152, // 2 MB safety margin "producer.override.buffer.memory": 134217728 // 128 MB for large records }

Tune topic max.message.bytes accordingly for very large records/batches.

What the optimizations bought me

Across the three workloads:

Request rate: ↓ 60–85% (fewer, fuller requests)
Broker ingress efficiency: ↑ 30–100% (less header/overhead per byte)
Throughput at same CPU: ↑ 1.3–2.3× (workload-dependent)
Latency impact: p50 +5–20 ms; p99 remained within typical CDC tolerances

These aren’t theoretical. They follow directly from larger, compressed batches and a linger long enough to fill them—bounded by your latency SLOs.

How to reproduce (fast)

Sample 100k records from each connector’s output topic.
Compute size histogram; get p50/p95.
Compress the sample with zstd/lz4; note ratio + CPU time.
Apply the decision procedure above; deploy per-connector overrides.
Watch records-per-request-avg, batch-size-avg, request-rate, and p99 latency for 10–15 minutes under steady load. Iterate.

Automate this with Superstream SuperClient

If you’d rather not build the profiling-and-tuning loop yourself, Superstream SuperClient automates exactly what this post outlines.

It continuously profiles real producer behavior (record-size distribution, batching efficiency, compression effectiveness, request/record rates per partition).
Computes the optimal batch.size, linger.ms, and compression.type per connector/topic.
Either surfaces a reviewed change set or safely applies it by overwriting existing client settings (via interceptor/sidecar) under guardrails—p99 latency budgets, canary rollout, and instant rollback.

You get the same outcomes as the experiments above—fewer requests, lower bandwidth, and higher throughput—plus a report showing records-per-request-avg, batch-size-avg fill %, compression ratios, request-rate deltas, and projected egress savings so the gains are auditable.

Closing

Don’t ship one set of producer settings across all source connectors. Match the knobs to the payload:

batch.size ≈ how many bytes you want per request.
linger.ms ≈ how long you can wait to fill that batch.
compression.type ≈ bytes-saved per CPU for your data.

If you do only one thing after reading this, go look at records-per-request-avg in prod. If it’s single digits for small/medium JSON, you’re burning requests—and money—for no value.

Stop Copy-Pasting Producer Settings Across Kafka Source Connectors

Real numbers for batch.size, linger.ms, and compression.type—and a method you can reproduce

TL;DR recommendations (then I justify them)

What I measured (and why the values make sense)

1) MySQL CDC (Debezium) — small JSON records

2) MongoDB CDC (Debezium) — larger JSON docs

3) S3 Source — NDJSON ~150 KB per record

What “analyze your workload” actually means

A simple decision procedure

Concrete configs I’d ship (starting points)

MySQL CDC (Debezium)

MongoDB CDC (Debezium)

S3 Source (NDJSON ~150 KB/rec)

What the optimizations bought me

How to reproduce (fast)

Automate this with Superstream SuperClient

Closing

Continue exploring with these related posts

Real numbers for `batch.size`, `linger.ms`, and `compression.type`—and a method you can reproduce