Skip to content

SneakersAPI/replication

Repository files navigation

Replication

Script used at SneakersAPI.dev to replicate data from ClickHouse to PostgreSQL. This script is used to sync data from ClickHouse to Postgres based on a YAML configuration file.

Key features:

  • Replicates data from ClickHouse to PostgreSQL.
  • Manage primary keys, indexes and destination columns types.
  • Time-series data can be synced via cursor to avoid full table scans.
  • Batch processing coupled with temporary tables in separate thread and connection.

About performance:

Measured from table creation to last upsert, with batch size of 50k rows:

  • 800k rows with 5 columns: around 10s, 80k rows/s
  • 170k rows with 18 columns: around 5s, 34k rows/s

Note: This tool might not be the best fit for high volume of data. We tested it only under 10 million rows.

Configuration

Configuration is done via a YAML file. See config.example.yml for reference.

Running

export CLICKHOUSE_DSN=<clickhouse_dsn>
export DATABASE_URL=<database_url>

go run . [-only=<table_name>] [-drop=<table_name>] [-config=<path>]
  • -only=<table_name>: Avoid running all tables and only process the one specified.
  • -drop=<table_name>: Drop the table after processing and reset cursor, if any.
  • -config=<path>: Path to the configuration file. Defaults to config.yml.

Docker

docker build -t replication .
docker run -e CLICKHOUSE_DSN=<clickhouse_dsn> \
    -e DATABASE_URL=<database_url> \
    replication \
    [-only=<table_name>] \
    [-config=<path>]