Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-hackernews #1656

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

source-hackernews #1656

wants to merge 8 commits into from

Conversation

danthelion
Copy link
Contributor

@danthelion danthelion commented Jun 14, 2024

Description:

(Describe the high level scope of new or changed features)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@danthelion danthelion changed the title HN capture source-hackernews Jun 14, 2024
@williamhbaker
Copy link
Member

@danthelion is this something you're looking for a review on?

@danthelion
Copy link
Contributor Author

@williamhbaker, a bit late but yes 😁

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments!

.idea/.gitignore Outdated
@@ -0,0 +1,8 @@
# Default ignored files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind removing this .idea folder from the files to commit?

async def spec(self, log: Logger, _: request.Spec) -> ConnectorSpec:
return ConnectorSpec(
configSchema=EndpointConfig.model_json_schema(),
documentationUrl="https://docs.estuary.dev",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI if you want to have actual docs for this connector, this is the URL to update for that.

import:
- acmeCo/flow.yaml
captures:
acmeCo/source-google-sheets:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name needs updated

@@ -0,0 +1,20 @@
credentials:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this config file is for this capture, since it doesn't look like this capture requires any configuration at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the sops part required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the configuration has fields that need to be encrypted, yeah. There's some instructions for how to do the encryption here. But if this capture doesn't require any kind of configuration, you could just delete the file.


item = Item.model_validate_json(req)

# stop the backfill when we catch up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's only a fetch_page defined, the connector will only backfill and will stop once item.time > log_cutoff, and then won't do anything else.

You might want to change this to be a fetch_changes function, that way it will continue to get records as new cursor values become available.

Also just noting that there are over 40MM total records and this is going to fetch them 1-by-1, so that is going to take an extremely long time to get to the present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants