Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial clickhouse support #4057

Merged
merged 4 commits into from
Feb 21, 2022

Conversation

ne1r0n
Copy link
Contributor

@ne1r0n ne1r0n commented Feb 4, 2022

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Feb 4, 2022

Unit Test Results (build & test)

  70 files  ±0    70 suites  ±0   13m 1s ⏱️ + 1m 3s
611 tests ±0  552 ✔️ ±0  59 💤 ±0  0 ±0 

Results for commit e6c23c4. ± Comparison against base commit 6b5fba8.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 4, 2022

Unit Test Results (metadata ingestion)

       5 files  ±0         5 suites  ±0   41m 21s ⏱️ - 3m 17s
   341 tests +4     341 ✔️ +4    0 💤 ±0  0 ±0 
1 552 runs  +8  1 514 ✔️ +8  38 💤 ±0  0 ±0 

Results for commit e6c23c4. ± Comparison against base commit 6b5fba8.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thank you.
I left a few minor comments.
The build is currently failing because I think you should add ClickHouse as dependency for the integration tests in setup.py.

base.ischema_names["UInt256"] = INTEGER

register_custom_type(custom_types.common.Array, ArrayTypeClass)
register_custom_type(custom_types.ip.IPv4, NumberTypeClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 is mapped as string but this one as Number, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ClickHouse those two IP types are based on different data type.
More info in the documentation IPv4 and IPv6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarificaiton

yield wu

def _get_db_name(self) -> str:
return getattr(self.config, "database_alias")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is database alias a mandatory config parameter?
As I can see there is a database property as well.
Is it possible to set only database and not database_alias?

Copy link
Contributor Author

@ne1r0n ne1r0n Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database parameter is used only for the connection.
database_alias is used as platform_instance and most likely should be replaced to it.
Both parameters are not mandatory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clickhouse has schema.table objects structure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ne1r0n : I would recommend using the platform_instance config going forward and not having database_alias option in this connector, now that we have platform instance support in the core classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shirshanka I've changed database_alias to platform_instance

@ne1r0n ne1r0n force-pushed the clickhouse_support branch 3 times, most recently from 3d4fc8e to 94b0235 Compare February 13, 2022 21:10
@ne1r0n ne1r0n force-pushed the clickhouse_support branch from 94b0235 to 56505b0 Compare February 13, 2022 21:15
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will merge after CI is green.

@shirshanka shirshanka merged commit c2065bd into datahub-project:master Feb 21, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants