Script used at SneakersAPI.dev to replicate data from ClickHouse to PostgreSQL. This script is used to sync data from ClickHouse to Postgres based on a YAML configuration file.
Key features:
- Replicates data from ClickHouse to PostgreSQL.
- Manage primary keys, indexes and destination columns types.
- Time-series data can be synced via cursor to avoid full table scans.
- Batch processing coupled with temporary tables in separate thread and connection.
About performance:
Measured from table creation to last upsert, with batch size of 50k rows:
- 800k rows with 5 columns: around 10s, 80k rows/s
- 170k rows with 18 columns: around 5s, 34k rows/s
Note: This tool might not be the best fit for high volume of data. We tested it only under 10 million rows.
Configuration is done via a YAML file. See config.example.yml
for reference.
export CLICKHOUSE_DSN=<clickhouse_dsn>
export DATABASE_URL=<database_url>
go run . [-only=<table_name>] [-drop=<table_name>] [-config=<path>]
-only=<table_name>
: Avoid running all tables and only process the one specified.-drop=<table_name>
: Drop the table after processing and reset cursor, if any.-config=<path>
: Path to the configuration file. Defaults toconfig.yml
.
docker build -t replication .
docker run -e CLICKHOUSE_DSN=<clickhouse_dsn> \
-e DATABASE_URL=<database_url> \
replication \
[-only=<table_name>] \
[-config=<path>]