Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-postgres: Handle table renames by re-backfilling #2279

Open
willdonnelly opened this issue Jan 16, 2025 · 0 comments
Open

source-postgres: Handle table renames by re-backfilling #2279

willdonnelly opened this issue Jan 16, 2025 · 0 comments
Labels
change:unplanned Unplanned change, useful for things like doc updates

Comments

@willdonnelly
Copy link
Member

In general our CDC replication is very solid in terms of capturing all changes to the source table, but there is one situation where it's possible to completely replace the contents of a source table without a single peep from our connector.

The way you do that is to load a bunch of new data into a staging table, then drop the old table and rename the staging table to the real table name. Since Postgres logical replication doesn't directly tell us about table dropping or renaming, we won't receive any messages about this and thus won't have anything to emit ourselves. This is not ideal, to say the least.

It is possible in principle to address this, but it requires periodically polling the OIDs of all active tables, persisting that information across task restarts, and then triggering a stream restart if the OID ever changes (or if the table disappears entirely for a while). It might be possible to combine this with normal discovery, but it also might not be worth the effort to do that.

It would be nice if this solution could also address table truncation, but we shouldn't try to tackle that here for three reasons:

  1. I'm not sure if there's any good way to reliably detect truncation based on catalog metadata.
  2. We don't currently have collection-level truncation signals, so there wouldn't be any useful signal gained by doing this if the source table is truncated.
  3. Due to (2), our other SQL CDC connectors ignore truncations currently so we might as well keep things consistent for now.
@willdonnelly willdonnelly added the change:unplanned Unplanned change, useful for things like doc updates label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

No branches or pull requests

1 participant