Posted by :
Idan Asulin
September 13, 2025

Stop Copy-Pasting Producer Settings Across Kafka Source Connectors

Real numbers for batch.size, linger.ms, and compression.type—and a method you can reproduce

When I first operationalized Kafka Source Connectors, I gave them all the same producer settings because… “consistency.” It worked, but it was inefficient. Once I tuned per workload, I cut request rates by 4–7× and shaved double-digit % off broker/network costs.

This post shows:

  • Concrete, reproducible numbers for MySQL CDC, MongoDB CDC, and an S3 Source workload.
  • What “analyze your workload” actually means.
  • A simple decision procedure to pick batch.size, linger.ms, and compression.type that you can apply to any connector.

Note on config names: Per-connector overrides go in the connector config as producer.override.* (e.g., producer.override.linger.ms). Worker-wide defaults live in the Connect worker config as producer.*. I’m showing per-connector here.

TL;DR recommendations (then I justify them)

Defaults (batch.size=16 KB, linger.ms=0, compression=none) are rarely good:

  • With small records, 16 KB caps you at single-digit records per request; 0 ms linger prevents batch fill.
  • With big records, batch.size < record size forces 1 record per request anyway; you lose batching benefits.

What I measured (and why the values make sense)

Cluster: Kafka 3.7, 3× brokers (m5.2xlarge-ish), RF=3, 1 Gbps network
Runs: 10-minute steady state at ~70–80% of connector max throughput
Metrics: Producer JMX (records-per-request-avg, batch-size-avg, compression-rate-avg, request-rate, record-send-rate, bufferpool-wait-time-total), broker ingress bytes/s, p50/p99 end-to-end latency at the topic.

1) MySQL CDC (Debezium) — small JSON records

  • Workload stats: mean 1.4 KB, p95 3.2 KB, highly compressible (zstd ≈ 0.28 ratio on samples).
  • Baseline (defaults)
    batch.size=16 KB, linger.ms=0, compression=none
    records-per-request-avg ≈ 4–7 (can’t fill more), request-rate high, broker ingress ~25 MB/s.
  • Optimized
    batch.size=128 KB, linger.ms=15 ms, compression=zstd
    records-per-request-avg ≈ 60–90, request-rate ↓ ~70%, broker ingress ~44–50 MB/s (same source rate, fewer headers/overheads), p50 latency +10–15 ms, p99 still < 150 ms.
  • Why 128 KB? With 1.4 KB records, target ~70–90 rec/batch needs ~100–125 KB uncompressed buffer. 15 ms linger is enough to fill at typical CDC rates without blowing out tail latency. zstd wins on JSON.

2) MongoDB CDC (Debezium) — larger JSON docs

  • Workload stats: mean 3.8 KB, p95 12 KB, compressible (zstd ≈ 0.35 ratio).
  • Baseline (defaults)
    records-per-request-avg ≈ 2–4, visible CPU spikes on brokers due to many small appends.
  • Optimized
    batch.size=256 KB, linger.ms=15 ms, compression=zstd
    records-per-request-avg ≈ 45–65, request-rate ↓ ~60–65%, throughput +~2× vs baseline, p50 +~12 ms, p99 < 200 ms.
  • Why 256 KB? p95 is 12 KB; aiming ~50 rec/batch → ~600 KB uncompressed would be overkill. But batches saturate earlier due to partitions and in-flight limits; empirically 256 KB gets you most of the win without memory pressure.

3) S3 Source — NDJSON ~150 KB per record

  • Workload stats: mean 150 KB, p95 400 KB, mixed compressibility (lz4 ≈ 0.8, zstd ≈ 0.6, but zstd CPU ~2–3× lz4 for marginal wins here).
  • Baseline (defaults)
    → Because batch.size(16 KB) < record size, you effectively send 1 record per request; request-rate is needlessly high; network spends too much time on per-request overhead.
  • Optimized
    batch.size=1 MB, linger.ms=10 ms, compression=lz4
    records-per-request-avg ≈ 5–7 (p95 still sends alone), request-rate ↓ ~4–6×, broker ingress +~30–40% vs baseline due to fewer headers and better IO patterns. Latency change negligible (big records dominate).
  • Why lz4? At these sizes, CPU becomes limiting before network; lz4 yields practical wins with minimal CPU. If your instances are CPU-rich and cost of egress matters, zstd can still be worth it.

Guardrails: For big records, ensure max.request.size and topic max.message.bytes comfortably exceed your p95 (plus headers). For tiny records, ensure buffer.memory is adequate if you raise linger/batch.

What “analyze your workload” actually means

Don’t guess. Measure these five things first:

  1. Record size distribution (uncompressed)
    • Collect a statistically meaningful sample (≥100k records).
    • Compute mean, p50, p95, p99 of serialized record size in bytes.
    • This predicts feasible records per batch given a batch.size.
  2. Compressibility
    • On your sample, test zstd -1, lz4, and (optionally) gzip.
    • Capture compression ratio (compressed/uncompressed) and CPU time on your target instance class.
    • Prefer the codec with the best bytes-saved per CPU-second for your payload.
  3. Steady-state record rate
    • Measure records/sec per partition and overall.
    • This plus your target records per batch gives a linger budget:
    • linger.ms ≈ 1000 * target_records_per_batch / steady_records_per_sec_per_partition
    • Cap linger to your end-to-end latency SLO.
  4. Producer fill metrics
    • Watch records-per-request-avg, batch-size-avg (JMX).
    • If batch-size-avg sits < 40–60% of configured batch.size, you’re not filling—reduce batch.size or increase linger.ms.
  5. Backpressure / memory
    • Check bufferpool-wait-time-total and rejected sends.
    • If you see waits, lower linger/batch or raise buffer.memory.
    • Keep delivery.timeout.ms sane when increasing linger.

A simple decision procedure

  1. Pick a target records-per-request
    • Small text records: 50–100
    • Medium JSON (MongoDB): 40–70
    • Large NDJSON (S3): 3–8
  2. Set batch.size (bytes)batch.size ≈ target_records_per_request * p50_record_size
    • Keep ≤ ~512 KB for small/medium; up to 1 MB for large payloads.
    • If batch-size-avg is < 50% in prod, trim by ~25%.
  3. Set linger.ms
    • Start with the formula above; clamp to 10–20 ms for small/medium, 5–15 ms for large.
    • Validate p99 latency.
  4. Choose compression.type
    • zstd for JSON/text unless CPU is the bottleneck.
    • lz4 for large or mixed/binary where CPU cost dominates.
    • Avoid none unless payloads are already compressed (e.g., gzip’d blobs).
  5. Re-measure
    • Aim for:
      • records-per-request-avg within ±20% of target
      • batch-size-avg at 50–80% of configured batch.size
      • request-rate drop ≥ vs baseline
      • p99 latency within SLO

Concrete configs I’d ship (starting points)

MySQL CDC (Debezium)

{
 "producer.override.batch.size": 131072,     // 128 KB
 "producer.override.linger.ms": 15,
 "producer.override.compression.type": "zstd",
 "producer.override.buffer.memory": 67108864 // 64 MB (optional headroom)
}

MongoDB CDC (Debezium)

{
 "producer.override.batch.size": 262144,     // 256 KB
 "producer.override.linger.ms": 15,
 "producer.override.compression.type": "zstd"
}

S3 Source (NDJSON ~150 KB/rec)

{
 "producer.override.batch.size": 1048576,    // 1 MB
 "producer.override.linger.ms": 10,
 "producer.override.compression.type": "lz4",
 "producer.override.max.request.size": 2097152, // 2 MB safety margin
 "producer.override.buffer.memory": 134217728   // 128 MB for large records
}

Tune topic max.message.bytes accordingly for very large records/batches.

What the optimizations bought me

Across the three workloads:

  • Request rate:60–85% (fewer, fuller requests)
  • Broker ingress efficiency:30–100% (less header/overhead per byte)
  • Throughput at same CPU:1.3–2.3× (workload-dependent)
  • Latency impact: p50 +5–20 ms; p99 remained within typical CDC tolerances

These aren’t theoretical. They follow directly from larger, compressed batches and a linger long enough to fill them—bounded by your latency SLOs.

How to reproduce (fast)

  1. Sample 100k records from each connector’s output topic.
  2. Compute size histogram; get p50/p95.
  3. Compress the sample with zstd/lz4; note ratio + CPU time.
  4. Apply the decision procedure above; deploy per-connector overrides.
  5. Watch records-per-request-avg, batch-size-avg, request-rate, and p99 latency for 10–15 minutes under steady load. Iterate.

Automate this with Superstream SuperClient

If you’d rather not build the profiling-and-tuning loop yourself, Superstream SuperClient automates exactly what this post outlines.

  • It continuously profiles real producer behavior (record-size distribution, batching efficiency, compression effectiveness, request/record rates per partition).
  • Computes the optimal batch.size, linger.ms, and compression.type per connector/topic.
  • Either surfaces a reviewed change set or safely applies it by overwriting existing client settings (via interceptor/sidecar) under guardrails—p99 latency budgets, canary rollout, and instant rollback.

You get the same outcomes as the experiments above—fewer requests, lower bandwidth, and higher throughput—plus a report showing records-per-request-avg, batch-size-avg fill %, compression ratios, request-rate deltas, and projected egress savings so the gains are auditable.

Closing

Don’t ship one set of producer settings across all source connectors. Match the knobs to the payload:

  • batch.size ≈ how many bytes you want per request.
  • linger.ms ≈ how long you can wait to fill that batch.
  • compression.type ≈ bytes-saved per CPU for your data.

If you do only one thing after reading this, go look at records-per-request-avg in prod. If it’s single digits for small/medium JSON, you’re burning requests—and money—for no value.

Related blogs

Continue exploring with these related posts