-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): add classification for sql sources #10013
feat(ingest): add classification for sql sources #10013
Conversation
309e4ab
to
764ea8a
Compare
metadata-ingestion/src/datahub/ingestion/source/sql/data_reader.py
Outdated
Show resolved
Hide resolved
else: | ||
query = sa.select([sa.text("*")]).select_from(table).limit(sample_size) | ||
query_results = self.engine.execute(query) | ||
# Not ideal - creates a parallel structure. Can we use pandas here ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah refactoring the query generation a bit would be nice
metadata-ingestion/src/datahub/ingestion/source/sql/data_reader.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/sql/data_reader.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Harshal Sheth <[email protected]>
…r.py Co-authored-by: Harshal Sheth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor comments, but otherwise looks good
metadata-ingestion/src/datahub/ingestion/source/sql/data_reader.py
Outdated
Show resolved
Hide resolved
|
||
# Not ideal - creates a parallel structure in column_values. Can we use pandas here ? | ||
for row in query_results.fetchall(): | ||
if isinstance(row, LegacyRow): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this LegacyRow thing seems fishy - is row
always going to be a LegacyRow? won't it be a Row
object in some cases?
we're on sqlalchemy 1.4, so LegacyRow should be deprecated https://docs.sqlalchemy.org/en/20/changelog/migration_14.html#rowproxy-is-no-longer-a-proxy-is-now-called-row-and-behaves-like-an-enhanced-named-tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Given that LegacyRow import and functionality in sql_common.py is working fine, the current code flow should also work well for now. I've added fix for this in followup PR.
…sification_for_sql_sources' into master+ing-319-classification_for_sql_sources
Checklist