Adjust Flickr reingestion schedule #1285
Labels
✨ goal: improvement
Improvement to an existing user-facing feature
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
⛔ status: blocked
Blocked & therefore, not ready for work
Problem
Per the discussion in WordPress/openverse-catalog#995, we will soon be enabling the Flickr DAG but expect that the current implementation will result in some unique results being skipped during ingestion. To combat this, we want to increase the frequency of reingestion so that each day gets reingested more frequently and we can expect greater coverage.
Description
At minimum, we should update the Flickr reingestion workflow to run on a
@daily
schedule (it is currently@weekly
). We should wait until the current implementation has been run successfully a few times, and we have data that shows it can complete in under 24 hours.Depending on how slow or fast the reingestion DAG takes, we may also be able to increase the number of reingestion days.
Alternatives
Alongside this effort, we are also planning on exploring other approaches to modifying the Flickr DAG to avoid the problems with duplicates and missing records. This work is meant to improve the DAG in the meantime.
The text was updated successfully, but these errors were encountered: