Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): kafka connect metadata ingestion #2516

Conversation

taufiqibrahim
Copy link
Contributor

@taufiqibrahim taufiqibrahim commented May 8, 2021

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Description

Metadata ingestion for Kafka Connect based on Kafka Connect REST API. Currently limited only for Debezium based source connectors. Sink connector not yet implemented.

Implementation summary:

  • Fetching connector from Kafka Connect REST API
  • Get connector's topic for each connector name
  • Parse topic names based on specific Debezium connector specification
  • Generate DataFlowSnapshot and DataJobSnapshot entities

@taufiqibrahim taufiqibrahim changed the title Kafka Connect Metadata Ingestion feat(ingest): Kafka Connect Metadata Ingestion May 8, 2021
@taufiqibrahim taufiqibrahim changed the title feat(ingest): Kafka Connect Metadata Ingestion feat(ingest): kafka connect metadata ingestion May 8, 2021
@taufiqibrahim taufiqibrahim marked this pull request as ready for review May 9, 2021 07:21
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome - thanks @taufiqibrahim!

Would love if you updated the README.md file with some basic docs, and also added some tests for this source 🙂


if connector_class in ('io.debezium.connector.vitess.VitessConnector'):
serverName = connector_config.get("database.server.name")
source_platform = 'vitess'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit seems super repetitive - any way we can extract this configuration into a table?

metadata-ingestion/setup.py Outdated Show resolved Hide resolved
@taufiqibrahim
Copy link
Contributor Author

Thanks @hsheth2 for your valuable feedback. Let me do the fixes

@shirshanka
Copy link
Contributor

Looks like some lint checks are failing.

You can fix it via: ./gradlew :metadata-ingestion:lintFix

@taufiqibrahim
Copy link
Contributor Author

Looks like some lint checks are failing.

You can fix it via: ./gradlew :metadata-ingestion:lintFix

Let me try this command. Thanks @shirshanka

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @taufiqibrahim!

@shirshanka shirshanka merged commit db78373 into datahub-project:master May 18, 2021
@taufiqibrahim taufiqibrahim deleted the taufiqibrahim-ingestion-kafkaconnect branch July 20, 2021 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants