What does an ideal metadata setup look like for a SDK-based Singer (or EDK) repo + MeltanoHub? #1205
Replies: 5 comments 5 replies
-
@tayloramurphy this is great! I have tons of thoughts so I'm going to split them out as separate comments so we can thread off of them if needed. Overall I agree with the direction though! |
Beta Was this translation helpful? Give feedback.
-
Additional Proposed TenetsConnector maintainers shouldnt have to make any hub contributions to participate. From the beginning we wanted the hub to be a place where users and connector maintainers could contribute to keep the ecosystem up to date, it hasnt really played out that way. Its too difficult for contributors to add new plugins or differentiate between a tap bug and a metadata bug so we dont get very many contributions. I also dont think a tap developer should have to think about updating metadata on the hub when they make changes. Ideally they would make one contribution to the hub, which is an issue to request that the connector gets added. Thats it. We could have dedicated hub contributors that help add, update, etc. but thats the responsibility of a few vs expecting the entire ecosystem of tap developers to get on board. Over time we started to come to terms with our reality that unless we put in the work to maintain the hub metadata then it won't get done, no contributors are coming along to do the hard work of scrape github for new connectors or to add the new setting they implemented, so we did it ourselves. At this point we have automation and assisted processes in place that make it significantly easier for us to maintain the hub but those tools and know-how arent really available to community members. They have to hand write metadata definitions and get rounds of feedback to get it to the standard of the automated processes. In most cases it ends up being easier for everyone if we just get an issue requesting a plugin to be added or updated. Proposal:
|
Beta Was this translation helpful? Give feedback.
-
Connector Repo to be SSOT for metadata
I completely agree with this! One challenge is that we have to go to each individual repo to suggest docs changes vs how we have it now where we could theoretically update all definitions in a single PR. Although having an override mechanism on the hub side would make some of this simpler. Thoughts:
An Idea: I've noticed that connector READMEs are a bit sparse usually and sometimes I have a hard time figuring out what the source actually is (especially if the service is something vague like Building on that idea we need a way for them to include markdown text for things like general info or advanced settings that need more than a single line description from the tap.py file. We could include something like the hub.yml (maybe call it docs.yml instead) to do this, as long as its easy to include markdown in there. Then again we'd be able to auto generate their README using that file as input, and also use that file as input to the hub so we avoid the issue of the hub and the readme having different info. It sort of starts to remind me of the dbt docs blocks approach https://docs.getdbt.com/docs/collaborate/documentation#using-docs-blocks. |
Beta Was this translation helpful? Give feedback.
-
Version ConstraintsI agree that we need some sort of versioning mechanism so that users get the appropriate metadata based on their pinned package version. This has started to feel less pressing for me over time but still important. Last time I brought it up I thought it was going to cause more problems than it did. I think because the life cycle of most taps is heavy development in the beginning then after that the config/metadata structure doesnt change all that much even if the internals of the tap change a lot. On major version bumps though it does become a problem, for example target-s3 that refactored its whole config to support multi cloud config options. From my perspective we have 3 options:
Option 1Like I said above, its feeling like less of an issue especially with lock files being the recommendation but for breaking changes its painful. Its also hard to evaluate how bad of an issue this is currently, mostly these would cause failures that lead the user to think it's a tap issue. Option 2I've been noodling with this idea of solving versioning by bypassing the hub all together. It's not a fully thought out idea yet but the concept being that since a meltano project pins a package version and the package has the metadata for that exact version of code (e.g. The hub starts to go back to its roots as a place to discover plugins and their settings but doesnt try to be the source of truth for all connector metadata, inline with the changes we're discussing in #1205 (comment). Option 3Build a versioning mechanism within the hub. Meltano would need to have a more explicit understanding of version because its going to have a hard time requesting metadata for a version if the version needs to be parsed out of a long pip_url string blob. Once meltano knows the exact version that pinned it can call the hub api with that as a parameter and the hub would need to resolve that version to a version range associated with metadata files. I think we could use versioning syntax like poetry does My opinion right now would be for Option 2. It bypasses a lot of complex work in the hub and will likely be more accurate. It does requires more complexity on the meltano side to understand raw SDK output but I don't think thats the worst. |
Beta Was this translation helpful? Give feedback.
-
@kgpayne @edgarrmondragon I mentioned this in the guild meeting today but I'm curious what your thoughts are. |
Beta Was this translation helpful? Give feedback.
-
Problem: The problem we're trying to solve generally is that Meltano doesn't control all of the Python packages in the larger Singer ecosystem. The Hub applies metadata about a specific package, but this is often out of sync with the repository itself. This discussion is about defining an ideal code+metadata state between the repo, Hub, and SDK.
Prior Topics:
Miro Board where I'm trying to visualize some of this: https://miro.com/app/board/uXjVMQEGFps=/?share_link_id=882576162622
Proposed Tenets
Ideal Scenario
Beta Was this translation helpful? Give feedback.
All reactions