-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sqlalchemy] Improve the performance of TrinoDialect.get_view_names #267
[sqlalchemy] Improve the performance of TrinoDialect.get_view_names #267
Conversation
a5a6c75
to
e0de7b9
Compare
f"trino://test@{host}:{port}/{request.param}", | ||
connect_args={ | ||
"max_attempts": 1, | ||
"schema": "test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to explicitly set the schema as part of the engine otherwise Trino barfed when trying to create the views with the
Schema must be specified when session schema is not set
error. I tried a number of things including explicitly including i) executing the USE test
statement, and ii) including the schema name in the view name, i.e., test.my_view
however this failed like because this was escaped to be "test.my_view"
rather than "test"."my_view"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On first sight it can be done by doing something like:
create_view(
...,
schema="test")
I also think you could include the schema in the pytest param @pytest.mark.parametrize('trino_connection', ['memory/test'], indirect=True)
but personally I would prefer to have it in the create_view
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdesmet the create_view method does not support setting an explicit schema. Regrettably also defining,
metadata = sqla.MetaData(schema=schema)
did not work and thus the only solution I was able to find—I spent hours trying to get this to work—was to explicitly set the schema at the time of the engine creation.
Additionally setting the schema in the engine ensures that _get_default_schema_name returns a non-None value which is necessary for the additional tests I added to test existing logic which wasn't previously tested.
@hashhar are you aware from where the performance difference comes from? It might be that query to |
@@ -368,3 +379,45 @@ def test_get_table_comment(trino_connection): | |||
assert actual['text'] is None | |||
finally: | |||
metadata.drop_all(engine) | |||
|
|||
|
|||
@pytest.mark.parametrize('trino_connection', ['memory'], indirect=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overriding the schema for any test you can just do
@pytest.mark.parametrize('trino_connection', ['memory'], indirect=True) | |
@pytest.mark.parametrize('trino_connection', ['memory/test'], indirect=True) |
metadata.drop_all(engine) | ||
|
||
|
||
@patch('trino.sqlalchemy.dialect.TrinoDialect._get_default_schema_name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove the default schema from the fixture you can rid yourself of this @patch
e0de7b9
to
a11c234
Compare
@mdesmet thanks for the review. I've addressed your comments. |
71ebc98
to
4662c06
Compare
@hashhar : PTAL I think we should make this change based on your feedback on Slack
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the commit title as "Improve the performance of get_view_names in SQLAlchemy"
WHERE "table_schema" = :schema | ||
AND "table_type" = 'VIEW' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth leaving code comment why this query uses information_schema.tables
.
setup.py
Outdated
@@ -39,6 +39,7 @@ | |||
"pytest", | |||
"pytest-runner", | |||
"click", | |||
"sqlalchemy-utils", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that we're now starting to test integration with a non-SQLAlchemy library. This is one of many other alternative libraries which enhance SQLAlchemy.
(This is needed because SQLAlchemy itself doesn't provide inbuilt functionality to create views - it does provide extension points which the sqlalchemy-utils library implements).
99ca619
to
5f8d4e2
Compare
5f8d4e2
to
08f7a1c
Compare
Description
Querying the
information_schema.views
table seems to be sub-performant for determining which entities are views—I speculate this is because all the view definitions need to be extracted/compiled. A more performant approach is to useinformation_schema.tables
with the appropriate type filter.For example for a very large schema at Airbnb (comprising of over 100k entities) the following query,
took ~ 5 seconds, whereas:
was still running after 10 minutes.
Note there were no prior integration tests for the
TrinoDialect.get_view_names
method so I added some to cover this logic and threw in a few extra for free which should cover all the various scenarios.Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: