-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(snowflake): opt-in denormalization of column names #24982
fix(snowflake): opt-in denormalization of column names #24982
Conversation
254921a
to
1fa6d86
Compare
87fed4b
to
04575d3
Compare
04575d3
to
c7a7656
Compare
c7a7656
to
a6346db
Compare
def upgrade(): | ||
op.add_column( | ||
"tables", | ||
sa.Column("normalize_columns", sa.Boolean(), nullable=True, default=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between NULL
and FALSE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley they're essentially the same. Would you prefer I change it to just nullable=True
without a default value, or just have the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only need two states, then I would stick with TRUE
and FALSE
, i.e., non-nullable, unless there's a performance or storage cost for using FALSE
rather than NULL
—which likely will be the predominant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems SQLAlchemy is slightly flaky when it comes to assigning default values with the NULL constraint in place. I dug around, and found that the is_sqllab_viz
flag on the SqlaTable
model is expected to work similarly, and there the migration also had to allow for nullable=True
. So reverting back to that.
4f5c21e
to
232f9b4
Compare
@@ -105,6 +105,7 @@ def test_external_metadata_by_name_for_physical_table(self): | |||
"database_name": tbl.database.database_name, | |||
"schema_name": tbl.schema, | |||
"table_name": tbl.table_name, | |||
"normalize_columns": tbl.normalize_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests that create a dataset with/without this value to check for api backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll do that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
2765dca
to
d3d8655
Compare
d3d8655
to
ddee4b3
Compare
1a715e6
to
fc593d2
Compare
fc593d2
to
114e867
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the fix @villebro!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
(cherry picked from commit f94dc49)
SUMMARY
The PR #24471, which meant to harmonize column naming for Oracle-like databases like Snowflake, caused issues for deployments that were relying on the current behavior of normalizing column names for physical datasets. This PR changes adds a field
normalize_columns
to the Dataset/SQLA Table models. This defaults toFalse
for new datasets, but for old datsets, this is set toTrue
via a db migration to ensure we don't break existing datasets.For existing datasets, "Normalize columns" is checked:
When checked, the behavior is consistent with how it was previously, i.e. physical datasets on Snowflake have normalized column names:
For new datasets, the checkbox is unchecked:
In this case, a physical dataset on Snowflake will denormalize the columns, usually showing them as UPPERCASE:
This means, that for new datasets, column names are no longer normalized, unless the flag is checked.
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION