This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Fixes WordPress/openverse#1322 by @stacimc
Description
Some ingestion days for the
metropolitan_museum_reingestion_workflow
have been failing due toDuplicateTable
errors. This is because the load table names are generated with the pattern:provider_data_{media_type}_{provider_name}_{timestamp}_{day_shift}
-- but the very long provider name for this DAG causes the table name to be truncated in Postgres, such that the day_shift and part of the timestamp (which are required for uniqueness) are cut off.This PR changes the order of the component parts of the table name such that
provider_name
is on the end, and is therefore the part that will get truncated if it is too long. This guarantees thatmedia_type
,timestamp
,day_shift
, and the first part of the provider name will fit.Functionally, this means that part of 'reingestion` gets cut off:
Uniqueness is only an issue if we create two DAGs that have very long provider names, which only differ in the final few characters.
Alternatives considered
met_museum_reingestion_workflow
ormetropolitan_reingestion_workflow
. The problem could still arise in the future._reingestion
suffix from the load table name. This would work because the 'normal' ingestion flow does not appendday_shift
, so the two DAGs would still be distinct -- but it is less human readable, and we could still encounter the problem with long provider names in the future.Testing Instructions
Run the Metropolitan reingestion DAG -- make sure to set a very low INGESTION_LIMIT before you start. Check the logs to make sure the load table names are being generated as expected.
I also ran a few other DAGs, including
wikimedia_commons_workflow
andwikimedia_reingestion_workflow
.Checklist
Update index.md
).main
) ora parent feature branch.
errors.
Developer Certificate of Origin
Developer Certificate of Origin