-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start adding java ETL examples, starting with kafka etl. #1805
Conversation
We've had a few requests to start providing Java examples rather than Python due to type safety. I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things. This is a work in progress. After we port all the examples to Java we'll delete the python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to eventually drop all Python-based ETL scripts or have them co-exist? We even have some closure ETL under contrib
, wondering if we should have a language-specific structure, e.g. /metadata-ingestion/java
/metadata-ingestion/python
instead?
...estion-examples/common/src/main/java/com/linkedin/metadata/examples/configs/KafkaConfig.java
Show resolved
Hide resolved
Is it possible to migrate the ingestion scripts under contrib? |
I did initially have that (under So yes, I pivoted and my new idea was to delete the Python examples afterward. |
The only downside is we may stop maintaining them. |
I'd say, for right now, put this ingestion example under |
Replacing all Python examples might take time, and I'd expect some people will still prefer Python over Java. Maintaining both is indeed non-ideal either so one option is to eventually send the Python scripts to |
I'll move them to contrib. |
Involved moving some files in contrib/metadata-ingestion to a specific jdbc directory first.
@mars-lan sorry for the delay, PTAL. I'll move each example as I port them. |
Some got moved to |
See that commit description. The contrib/metadata-ingestion dir seemed to just be a jdbc example at the top level. I moved those into a jdbc dir. |
e.g. I don't see any mentioning of jdbc in this? It's using python hive lib directly? https://github.com/linkedin/datahub/blob/12607e30c6f33c35652c425bd983e69ac0860543/contrib/metadata-ingestion/jdbc/bin/dataset-hive-generator.py. All the |
That commit as in this one. Not 1805 overall lol. So if you look at the state of things before, it was
I assumed everything that wasn't Part of the issue here is lack of structure and documentation to start with. It is not clear to me what these are for. I want this to be very well structured, with everything under |
I don't really want to block the java ports on cleaning up contrib, unless you have a clear path forward. Are we sure we can't just delete these python examples? :p |
Let's move the existing Haskell stuff to |
Done |
@jplaisted Why are we maintaining separate
Two directories are creating confusion I guess |
In theory, if we port all the examples to Java, we'll have one folder. I'll try to find more time to continue porting things. |
We've had a few requests to start providing Java examples rather than Python due to type safety.
I've also started to add these to metadata-ingestion-examples to make it clearer these are examples. They can be used directly or as a basis for other things.
This is a work in progress. After we port all the examples to Java we'll delete the python versions.
#1743
Checklist