A few gotchas when running Debezium on MSK Connect

Introduction

A while ago, my team replaced our previous data pipeline with a CDC setup based on Amazon MSK Connect running the Debezium connector for PostgreSQL, streaming changes from our RDS PostgreSQL database to a Kafka topic. I talked about the broader journey at AWS Summit Poland 2025, but the talk didn't have time for everything I'd have liked to share, so I decided to dive a bit into those as a part of this blog post.

I'm assuming you already know roughly what Debezium is and what MSK Connect does. If you don't, the MSK Connect docs and Debezium documentation are pretty good starting points.

Prerequisites that catch people out

Before any tip below matters, three things have to be in place:

Logical replication has to be enabled on the RDS instance. That means rds.logical_replication=1 in the parameter group, which requires a reboot.
The DB user used by Debezium needs rds_superuser and rds_replication roles. Both are required and just rds_replication isn't enough on RDS.
You need a custom plugin package uploaded to S3. MSK Connect doesn't ship with Debezium out of the box. You bundle the connector JARs into a ZIP, upload it to S3, and reference it as a custom plugin in the connector config.

Knowing this beforehand would save me at least few hours of debugging, as unfortunately, the logs and error messages aren't always super clear.

Tips and Gotchas

Heartbeats matter for low-traffic tables

Debezium tracks its position in PostgreSQL using a logical replication slot. As long as Debezium is making progress through the WAL, Postgres can recycle the WAL segments behind that slot. If Debezium is running but isn't seeing any changes (because the tables it's watching are not getting a ton of updates), the slot doesn't advance, and Postgres keeps WAL segments around. On a database that's busy in tables you don't capture and pretty slow for the ones you do, the WAL can grow quite significantly over time.

You can address it by taking advantage of the heartbeat mechanism in Debezium. You can set heartbeat.interval.ms in the connector config and Debezium will periodically write to a heartbeat table, which would trigger the slot.

Use signals to add tables without restarting the connector

When you initially configure Debezium, you tell it which tables to track and it does an initial snapshot before switching to streaming. But what if I need to add a new table for tracking afterwards?

The way to do it is using signals, you can create a debezium_signal table in your db, then configure Debezium to watch it, and issue an incremental snapshot signal whenever you need to add a table:

INSERT INTO public.debezium_signal
(id, type, data)
VALUES (
    'id-1',
    'execute-snapshot',
    '{
        "data-collections":[
            "public.orders"
        ],
        "type":"INCREMENTAL"
    }');

The same signal can also be sent as a Kafka message to a dedicated signals topic if you'd rather not write to the database. Either way, the connector picks up the signal, snapshots the new table incrementally without blocking the streaming of existing tables, and merges the snapshot back into the main stream. Much nicer than a full restart.

Turn off schema embedding if you don't need it

By default, every Debezium event includes a copy of the schema in the JSON payload. The intent is to let consumers parse messages without an external schema registry. The downside is that the schema can be often a significant part of the payload, and if you control the consumers, you can just share the schema with them directly.

If your consumers don't actually use the embedded schema, you can turn it off by the following config:

key.converter.schemas.enable=false
value.converter.schemas.enable=false

Max tasks is always 1 for PostgreSQL

This is specific to PostgreSQL, not a limitation of Debezium. There can only be one client reading a given replication slot, which means that you cannot scale Debezium for PostgreSQL by adding more tasks, so always configure just 1 task.

Plan for at-least-once delivery

Debezium gives you at-least-once delivery, not exactly-once. On connector restart, on rebalance, and on most kinds of failure, you can see duplicate events on the topic. Always make sure that your consumers are aware of that and are able to either have deduplication mechanism or idempotent processing of events.

Mind your partition keys

Kafka guarantees ordering only within a partition. Debezium picks the partition based on the event key, which by default is the primary key of the source table, so events for the same row arrive in order, which is usually what you want.

Logs are how you debug everything

MSK Connect lets you enable connector logs to CloudWatch, S3, or Kinesis Firehose and I cannot recommend enabling those high enough. Failure modes for these connectors are not always obvious and the logs are particularly helpful when you're first setting all stuff up. Connection failures to RDS, IAM auth issues with MSK, plugin loading errors, and replication slot conflicts all show up there with reasonably clear messages. Though there are still cases where logs are a bit cryptic, but that's another problem.

Summary

MSK Connect with Debezium is a solid CDC setup once you get past the initial wiring, but there are enough small details to stay on top of that the path from "got it running" to "comfortable in production" takes longer than you'd expect. Heartbeats, signals, schema embedding, the single-task limit, at-least-once delivery, partition keys, and logging are the seven things I'd make sure to address before considering this kind of pipeline production-ready. Thanks for reading!