Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how the new GCP pipeline works #6

Closed
1 of 3 tasks
rviscomi opened this issue Feb 1, 2022 · 2 comments
Closed
1 of 3 tasks

Document how the new GCP pipeline works #6

rviscomi opened this issue Feb 1, 2022 · 2 comments
Assignees

Comments

@rviscomi
Copy link
Member

rviscomi commented Feb 1, 2022

We could use the Wiki section of this repo to maintain the docs. It should document how the pipeline works and act as a playbook to handle unexpected issues.

wishlist

  • playbook (e.g. how to start the crawl)
  • expand on open issues / edge cases (e.g. what happens when a table is deleted mid crawl)
  • go back through history and document issues encountered with Beam Python SDK
@giancarloaf
Copy link
Collaborator

@tunetheweb heads up, I would love to get some fresh eyes on the process to run the pipeline and improve the documentation to cover everything someone new might need. I'll reach out again once we get closer to "completing" the pipeline so we can work on this together and you can poke holes in everything!

@max-ostapenko
Copy link
Contributor

Closing it as this pipeline was deprecated.
See the new pipeline and docs here: https://github.com/HTTPArchive/dataform?tab=readme-ov-file

@max-ostapenko max-ostapenko closed this as not planned Won't fix, can't repro, duplicate, stale Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants