-
Notifications
You must be signed in to change notification settings - Fork 54
Refactor StockSnap to use ProviderDataIngester #601
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I am so excited to see this @rwidom, thanks for picking this up!
I was able to run this locally with real data, but first you'll need to update the StockSnap provider workflow definition in provider_workflows.py
to configure the ingester class:
ProviderWorkflow(
provider_script="stocksnap",
ingester_class=StockSnapDataIngester,
)
I left a couple comments for some errors being thrown at the moment, but then you should be able to run this locally and actually pull data. I don't think an API key is necessary.
StockSnap doesn't use query parameters, but reflects the incremental page in the endpoint url
All of these providers seem to have some little quirk! 😄 I like your idea of updating the endpoint with a page counter.
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Staci Cooper <[email protected]>
Co-authored-by: Staci Cooper <[email protected]>
I don't seem to be able to run this with real data locally (at least, when I do I think everything else is resolved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple quick comments I forgot as I was re-testing this -- thanks for your perseverance despite not being able to run this locally! I do think your problems must be related to your environment setup, since the API does not require authentication 😞
After those comments though, I was able to run StockSnap with and without an ingestion_limit
and pull/load data perfectly 🎉
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Staci Cooper <[email protected]>
Co-authored-by: Staci Cooper <[email protected]>
…github.com/WordPress/openverse-catalog into issue/595-refactor-stocksnap-4-newingester
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't provide as many in-depth suggestions for the actual code as other folks on the team, but I'm able to run this locally without issue. It's great to see the new ProviderDataIngester
in action!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @rwidom! I hope you have been able to run it, it's working well. I have left just a few comments to improve the data we are getting from this provider, but really this is almost nearly complete. Excellent work!
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
openverse_catalog/dags/providers/provider_api_scripts/stocksnap.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Krystle Salazar <[email protected]>
Co-authored-by: Krystle Salazar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 This looks great! Thanks so much for your perseverance with local testing issues :) Everything looks good, and running locally works great.
Thanks so much @zackkrida , @krysal and @stacimc! |
Thanks so much! I didn't see it had been approved when I just edited the overview. Oh well. Either way-- Yay! Onward! |
Fixes
Fixes WordPress/openverse#1512 by @stacimc
Description
Moves StockSnap provider API script to use the new ProviderDataIngester class, and updates a couple of the url and title details. (More detail in #299 and #555 ).
Testing Instructions
Check out the branch, and run
just test
as one piece.You can also go to the Airflow page and manually kick off the stocksnap_workflow dag, and take a look at the resulting tsv file. The Airflow log will give you the exact file name, and you can see it by running
just shell
andmore /var/workflow_output/stocksnap_image_v001_20220713202334.tsv
(or whatever the file is called). My only hesitation with running it that way is wanting to avoid abusing / overwhelming the stocksnap API.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin