CDC (Change Data Capture) is a process of identifying and tracking data change events in databases. CDC is an efficient mechanism to achieve reliable and scalable data replication across different systems.
In the 6.7 release of Diffusion™, we have introduced a brand-new CDC adapter, which enables users to replicate data from databases into a Diffusion server (or server cluster). This adapter uses the Debezium engine to connect to databases. Debezium uses a log-based change data capture mechanism where it reads transaction logs from the database to identify the row level change events. These will be processed by the adapter to publish to a specific Diffusion topic. Debezium provides connectors for many different databases. The Diffusion CDC adapter initially supports fully-tested MySQL and PostgreSQL connectors out-of-the-box. Connection to other databases may work by adding a specific connector .jar in the classpath when running the adapter but are not supported yet.
Each row-level data change event (insert/update/delete) captured by Debezium will be processed and published to a JSON Diffusion topic. More details about mapping these events to Diffusion topic can be found below. Every event consists of details about data changes as well as its schema. The schema of the data (or a table) can be optionally published to a separate Diffusion topic specific to that table of the database. This is configurable and is disabled by default.
With the usage of Debezium at its core, the adapter supports setting any configuration options supported by Debezium. Users can use these Debezium configurations (e.g. For Mysql database) according to their requirement in the adapter. The configuration parameters in Debezium provide an option to exclude/include a list of databases/tables/columns to track. So, users will be able to configure a single adapter to track different tables in a database with different sets of configurations. Similarly, any restrictions and requirements for using Debezium also apply to this adapter.
Each row-specific event captured by Debezium is published to a Diffusion topic. Each row of the table is identified by its primary key. Hence, if a table does not have a primary key defined, the updates for this table will be ignored. The adapter supports four different ways to map these events to a Diffusion topic. These are the configuration options to be used in the configuration of the adapter.
N.B: If a table has a composite primary key, values of those keys will be escaped and concatenated together with ‘,’ to formulate a complete primary key combination, which will be used in Object and Row topic mapping, as defined above.
There are several other configuration options to configure the adapter. All of these can be viewed here.
Details about how to run the adapter can be found here.
Details about managing and monitoring CDC adapter via Diffusion console can be found here.
docker run -it --rm --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw debezium/example-mysql:1.5
docker run -it --rm --name mysqlterm --link mysql --rm mysql:5.7 sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'
GRANT ALL PRIVILEGES ON inventory.* TO 'mysqluser'@'%';
java -jar cdc-adapter-6.7.jar ./configuration.json
If you want to run the adapter with only bootstrap configurations, you can pass them as system properties.
java -jar -Dgateway.client.id=testCdcAdapter -Ddiffusion.gateway.server.url=ws://localhost:8080 -Ddiffusion.gateway.principal=admin -Ddiffusion.gateway.password=password cdc-adapter6.7.jar
Once the adapter is up and running, navigate to the Diffusion console’s ‘Topic browser’ view to see that the data contained in the database is replicated to the JSON Diffusion topic, according to the provided configuration.
N.B. This will be visible only if snapshotting (fetching snapshot of data from the database) is enabled in the configuration, which is false by default.
Data can be inserted/updated in the database which will be reflected in the Diffusion topics.