Releases: WordPress/openverse-catalog
v1.5.3
Improvements
- Improve request count logging for Flickr (#1085) @stacimc
- Offset iNaturalist DAG from monthly by one day (#1072) @AetherUnbound
Internal Improvements
- Bump apache-airflow[amazon,http,postgres] from 2.5.2 to 2.5.3 (#1082) @dependabot
- Retired update_phylopic_foreign_identifier DAG. (#1084) @ArslanYM
- Move terminate long queries DAG to retired (#1078) @AetherUnbound
- Bump pre-commit from 3.2.0 to 3.2.1 (#1081) @dependabot
Bug Fixes
- Update references to just airflow in README #1076 (#1088) @Prathamdas3
- Check required fields in media store (#1086) @stacimc
- Extend Metropolitan reingestion DAG timeout (#1087) @stacimc
- Update PhyloPic DAG to use API v2 (#1060) @krysal
- Create DAG to fix PhyloPic's
foreign_identifier
column (#1074) @krysal
Credits
Thanks to @AetherUnbound, @ArslanYM, @Prathamdas3, @dependabot, @dependabot[bot], @krysal, @openverse-bot and @stacimc for their contributions!
v1.5.2
New Features
- Log last query_params hit before AirflowTaskTimeout (#1058) @stacimc
- Update README.md with documentation reference (#1052) @itemrarity
- Add DAG for terminating long-running queries (#1050) @stacimc
Improvements
- Update Freesound to quarterly, extend timeout (#1068) @stacimc
- Update Flickr large batch handling (#1047) @stacimc
- Add SuggestedSubProvider type (#1040) @stacimc
- Add option to skip specific ingestion errors (#1011) @stacimc
- Add a DAG for backfilling license_url when meta_data is null (#1005) @obulat
- Improve license URL validation (#1028) @obulat
- Add flickr sub provider auditing dag (#1034) @stacimc
- Add Airflow variable used to configure overrides for task timeouts (#976) @stacimc
- Add logging to iNaturalist date check (#1035) @rwidom
- Update
Dockerfile
s with small improvements (#1016) @dhruvkb - Update Flickr to use new time delineated ingester class (#995) @stacimc
Internal Improvements
- Add isort configuration file (#1054) @raiyaj
- Update pgcli version to 3.5.0 (#1070) @AetherUnbound
- Bump apache-airflow[amazon,http,postgres] from 2.5.1 to 2.5.2 (#1064) @dependabot
- Bump pre-commit from 3.1.1 to 3.2.0 (#1065) @dependabot
- Add required stack label to dependabot PRs (#1063) @AetherUnbound
- Remove Implementation section from issue templates (#992) @miikkuu
- Bump pytest-socket from 0.5.1 to 0.6.0 (#1029) @dependabot
- Bump pre-commit from 3.0.2 to 3.1.1 (#1030) @dependabot
- Speed up some tests (#1021) @AetherUnbound
- Add an "Airflow Alert" issue template (#994) @AetherUnbound
- 🔄 synced file(s) with WordPress/openverse (#993) @openverse-bot
- Remove unnecessary dev dependencies (#990) @miikkuu
Bug Fixes
- Add required stack label to dependabot PRs (#1063) @AetherUnbound
- Handle the upper case licenses in the add_license_dag (#1049) @obulat
- Remove watermarked setting for SMK (#1048) @AetherUnbound
- Adjust schedule for long running queries termination (#1051) @obulat
- Use Python to group items by license to speed up the query (#1045) @obulat
- Remove alternate image extraction from SMK, fix foreign landing URL (#1003) @AetherUnbound
- Update
LICENSE
to match main repo (#1042) @dhruvkb - Tweak Flickr time division settings, add logs (#1041) @stacimc
- Add trailing slash to Jamendo thumbnail URLs (#1038) @AetherUnbound
- Adjust Flickr max records to account for incorrect reporting (#1031) @stacimc
- Temporarily turn off scheduled image data refreshes, increase matview refresh timeout (#1036) @stacimc
- Wikimedia: re-attempt large batches with reduced parameter selection (#1008) @AetherUnbound
- Increase image matview refresh timeout, remove retries, better timeouts (#1014) @AetherUnbound
- Terminate PG query when task is killed via Airflow (#717) @rwidom
- Ensure uniqueness of load table names (#1009) @stacimc
- Preserve trailing slashes for WordPress URLs (#1006) @AetherUnbound
- Replaced
execution_date
withlogical_date
(#1001) @sora-san45 - Remove API & Frontend repos from PR reminder check (#1010) @AetherUnbound
- Add dayshift to tsv filenames for reingestion workflows (#969) @stacimc
- Update Europeana endpoint (#974) @stacimc
Credits
Thanks to @AetherUnbound, @dependabot, @dependabot[bot], @dhruvkb, @itemrarity, @miikkuu, @obulat, @openverse-bot, @raiyaj, @rwidom, @sora-san45 and @stacimc for their contributions!
v1.5.1
New Features
- Add a Nappy provider DAG using ProviderDataIngester (#796) @zackkrida
Improvements
- Stripping the ImageCategory class of its "Enum" derivative, creating a new AudioCategory class, and removing 2 troublesome test directories (#971) @flamesjames
- Add task ID pattern to notification skip criteria (#960) @AetherUnbound
- Fill creator name in finnish museum DAG (#978) @krysal
- Make Phylopic a dated-only DAG (#944) @AetherUnbound
- iNaturalist in-SQL loading (#745) @rwidom
- Set
filesize
andduration
overflowed values toNone
(#945) @AetherUnbound - Update the deployment docs to showcase the bump version command (#962) @AetherUnbound
Internal Improvements
- Upgrade to Airflow 2.5.1, remove old warnings (#985) @AetherUnbound
- Bump docker/build-push-action from 3 to 4 (#987) @dependabot
- 🔄 synced file(s) with WordPress/openverse (#986) @openverse-bot
- Bump pook from 1.0.2 to 1.1.1 (#982) @dependabot
- Bump isort from 5.11.4 to 5.12.0 (#980) @dependabot
- Stripping the ImageCategory class of its "Enum" derivative, creating a new AudioCategory class, and removing 2 troublesome test directories (#971) @flamesjames
- 🔄 synced file(s) with WordPress/openverse (#963) @openverse-bot
- Bump pre-commit from 2.21.0 to 3.0.2 (#983) @dependabot
Bug Fixes
- Discard audio file when preview 404s fetching filesize (#973) @stacimc
- Finnish DAG: Dynamically generate timeslices depending on the amount of records (#934) @stacimc
- Increase Europeana reingestion timeout to 16 hours (#970) @sora-san45
- Temporarily increase Freesound delay & timeout (#943) @AetherUnbound
- Defer retrieval of GitHub API key to runtime (#948) @AetherUnbound
- Fix rotate_db_snapshot ARN template (#961) @sarayourfriend
Credits
Thanks to @AetherUnbound, @dependabot, @dependabot[bot], @flamesjames, @krysal, @openverse-bot, @rwidom, @sarayourfriend, @sora-san45, @stacimc and @zackkrida for their contributions!
v1.5.0
Improvements
- Upgrade to Airflow 2.5.0 (#939) @AetherUnbound
Internal Improvements
- 🔄 Synced file(s) with WordPress/openverse (#949) @openverse-bot
- 🔄 Synced file(s) with WordPress/openverse (#938) @openverse-bot
- Bump isort from 5.10.1 to 5.11.4 (#940) @dependabot
- Bump pre-commit from 2.20.0 to 2.21.0 (#941) @dependabot
- Bump black from 22.10.0 to 22.12.0 (#942) @dependabot
Bug Fixes
- Met Museum Reingestion timeout set to 16 hours (#958) @muddi900
- Allow no content responses from GitHub (#937) @AetherUnbound
- Handle Freesound 404s when fetching audio sets (#928) @stacimc
- Adjust year ranges for Science Museum (#946) @stacimc
- 🔄 Synced file(s) with WordPress/openverse (#938) @openverse-bot
Credits
Thanks to @AetherUnbound, @dependabot, @dependabot[bot], @muddi900, @openverse-bot and @stacimc for their contributions!
v1.4.1
- RDS Snapshot rotation DAG (#904) @sarayourfriend
Credits
Thanks to @sarayourfriend for their contributions!
v1.4.0
New Features
Improvements
- Add DAG tag for showing dated vs full ingestion (#908) @AetherUnbound
- Retire Common Crawl module & DAGs (#870) @AetherUnbound
- Add
HEAD
requests support toDelayedRequester
(#865) @twstokes
Internal Improvements
- 🔄 Synced file(s) with WordPress/openverse (#921) @openverse-bot
- Use Airflow base Docker image (#874) @AetherUnbound
- 🔄 Synced file(s) with WordPress/openverse (#918) @openverse-bot
- Replace deprecated
set-output
command (#910) @krysal - Bump pytest-sugar from 0.9.5 to 0.9.6 (#900) @dependabot
- Bump flake8 from 5.0.4 to 6.0.0 (#901) @dependabot
- 🔄 Synced file(s) with WordPress/openverse (#896) @openverse-bot
- Only report DAG sync when DAG files have changed (#872) @AetherUnbound
- 🔄 Synced file(s) with WordPress/openverse (#871) @openverse-bot
Bug Fixes
- Break load_from_s3 into separate tasks to fix duplicate reporting (#914) @stacimc
- Halt ingestion when WordPress Photo Directory reaches last page (#916) @stacimc
- Make Finnish DAG dated (#879) @stacimc
- Reinstate image thumbnail column (#903) @krysal
- Handle Science Museum errors with batches larger than 50 pages (#905) @stacimc
- Add Rawpixel to image popularity recalculation logic (#897) @AetherUnbound
- Only report DAG sync when DAG files have changed (#872) @AetherUnbound
Credits
Thanks to @AetherUnbound, @dependabot, @krysal, @openverse-bot, @stacimc and @twstokes for their contributions!
v1.3.6
- Filter out the extra fields with no additional info in Cleveland Museum metadata (#851) @satya-vinay
New Features
- Add reingestion DAG for Phylopic (#830) @stacimc
- Add reingestion workflow for Metropolitan (#819) @stacimc
Improvements
- Unify pre-commit config across repos (#867) @dhruvkb
- Remove unused arguments in MediaStore (#862) @bengreeley
- Remove legacy provider DAG logic (#849) @stacimc
- Refactor WordPress to use
ProviderDataIngester
(#835) @AetherUnbound - Refactor Europeana to use
ProviderDataIngester
(#821) @sarayourfriend - Refactor Rawpixel to use
ProviderDataIngester
(#795) @AetherUnbound - Refactor Flickr to use ProviderDataIngester (#809) @stacimc
- Respect ingestion limit in process_batch (#818) @stacimc
- Refactor Phylopic to use ProviderDataIngester (#747) @AetherUnbound
- Add docs and template ProviderDataIngester (#790) @stacimc
- Refactor NYPL script to use ProviderDataIngester class (#630) @obulat
- Allow default retry count to be determined by environment variable (#806) @Pmeet
- Use English for SMK results (#807) @zackkrida
- Respect the ingestion limit if ingest_records is called multiple times (#804) @stacimc
- Retire TSV loading workflow (#789) @AetherUnbound
Internal Improvements
- Bump apache-airflow[amazon,http,postgres] from 2.4.1 to 2.4.2 (#842) @dependabot
- Unify pre-commit config across repos (#867) @dhruvkb
- Adding pyupgrade pre-commit hook (#866) @aqeelat
- Bump docker/login-action from 1 to 2 (#840) @dependabot
- Unpin pytest-xdist (#843) @dependabot
- Bump alex-page/github-project-automation-plus from 0.8.1 to 0.8.2 (#841) @dependabot
- Bump actions/download-artifact from 2 to 3 (#839) @dependabot
- Make dependabot update github actions dependencies (#832) @krysal
- Refactor Smithsonian Museum to use
ProviderDataIngester
class (#812) @krysal - 🔄 Synced file(s) with WordPress/openverse (#816) @openverse-bot
- Add condition to not notify slack when don't have files in dags folder (#792) @ilitotor
- Allow default retry count to be determined by environment variable (#806) @Pmeet
- Standardize Airflow Variable names to uppercase (#801) @davcortez
- 🔄 Synced file(s) with WordPress/openverse (#814) @openverse-bot
- 🔄 Synced file(s) with WordPress/openverse (#802) @openverse-bot
Bug Fixes
- Override
get_should_continue
for science museum ingester (#868) @aqeelat - Science Museum: Handle unrecognized licenses (#850) @AetherUnbound
- Fix Europeana success test, activate reingestion workflows (#848) @AetherUnbound
- Retire Thingiverse, etlMods, commoncrawl ETL (#833) @AetherUnbound
- Restore the old dag_id for wikimedia_reingestion_workflow (#837) @stacimc
- 🔄 Synced file(s) with WordPress/openverse (#816) @openverse-bot
- Standardize Airflow Variable names to uppercase (#801) @davcortez
- 🔄 Synced file(s) with WordPress/openverse (#814) @openverse-bot
- 🔄 Synced file(s) with WordPress/openverse (#802) @openverse-bot
- Made improvements to
CONTRIBUTING.md
(#791) @kavyabhat02
Credits
Thanks to @AetherUnbound, @Pmeet, @aqeelat, @bengreeley, @davcortez, @dependabot, @dependabot[bot], @dhruvkb, @ilitotor, @kavyabhat02, @krysal, @obulat, @openverse-bot, @sarayourfriend, @satya-vinay, @stacimc and @zackkrida for their contributions!
v1.3.5
Improvements
- Refactor Freesound to use ProviderDataIngester (#746) @AetherUnbound
- Refactor Jamendo to use the ProviderDataIngester (#741) @stacimc
- Increase dependabot PR limit to 10 (#780) @AetherUnbound
- Add user agent to StockSnap header and use header in requests by default (#765) @rwidom
- Improved data refresh status reporting (#744) @AetherUnbound
- Refactor SMK script to use the
ProviderDataIngester
class (#742) @krysal - Default unfurling of links and media to False in Slack notifications (#743) @stacimc
Internal Improvements
- Retire Walters Art Museum provider script (#786) @AetherUnbound
- Bump pytest-mock from 3.9.0 to 3.10.0 (#781) @dependabot
- Disable email on failure by default (#788) @AetherUnbound
- Add concurrency settings for workflow (#770) @alrz1999
- 🔄 Synced file(s) with WordPress/openverse (#787) @openverse-bot
- Increase dependabot PR limit to 10 (#780) @AetherUnbound
- 🔄 Synced file(s) with WordPress/openverse (#771) @openverse-bot
- Bump pre-commit from 2.14.0 to 2.20.0 (#779) @dependabot
- Bump tldextract from 3.3.1 to 3.4.0 (#777) @dependabot
- Bump apache-airflow[amazon,http,postgres] from 2.4.0 to 2.4.1 (#767) @dependabot
- Bump pytest-sugar from 0.9.4 to 0.9.5 (#751) @dependabot
- Bump isort from 5.9.3 to 5.10.1 (#764) @dependabot
- Bump black from 22.3.0 to 22.10.0 (#778) @dependabot
- Bump pytest-mock from 3.6.1 to 3.9.0 (#749) @dependabot
- Bump tldextract from 3.1.0 to 3.3.1 (#752) @dependabot
- Bump flake8 from 3.9.2 to 5.0.4 (#750) @dependabot
Bug Fixes
- Fix italics for duration disclosure (#769) @stacimc
- Remove periods after URLs in log lines. (#763) @kamiwis
- Add dependabot config (#740) @AetherUnbound
Credits
Thanks to @AetherUnbound, @alrz1999, @dependabot, @dependabot[bot], @kamiwis, @krysal, @openverse-bot, @rwidom and @stacimc for their contributions!
v1.3.4
New Features
Improvements
- Add tags option for provider workflows & "legacy-ingestion" tag (#739) @AetherUnbound
- Update reingestion workflows to load and report data (#618) @stacimc
Internal Improvements
- Add tags option for provider workflows & "legacy-ingestion" tag (#739) @AetherUnbound
- Bump Airflow to 2.4.0, standardize version bump process (#737) @AetherUnbound
- 🔄 Synced file(s) with WordPress/openverse (#735) @openverse-bot
- Add ShellCheck to pre-commit config (#718) @MustkimKhatik
Bug Fixes
- Update reingestion workflows to load and report data (#618) @stacimc
- 🔄 Synced file(s) with WordPress/openverse (#735) @openverse-bot
Credits
Thanks to @AetherUnbound, @MustkimKhatik, @openverse-bot and @stacimc for their contributions!
v1.3.3
Internal Improvements
- 🔄 Synced file(s) with WordPress/openverse (#733) @openverse-bot
Bug Fixes
- Bump Airflow version to 2.3.4 (#731) @AetherUnbound
- 🔄 Synced file(s) with WordPress/openverse (#733) @openverse-bot
Credits
Thanks to @AetherUnbound and @openverse-bot for their contributions!