fix(snowflake): opt-in denormalization of column names #24982

villebro · 2023-08-14T17:57:41Z

SUMMARY

The PR #24471, which meant to harmonize column naming for Oracle-like databases like Snowflake, caused issues for deployments that were relying on the current behavior of normalizing column names for physical datasets. This PR changes adds a field normalize_columns to the Dataset/SQLA Table models. This defaults to False for new datasets, but for old datsets, this is set to True via a db migration to ensure we don't break existing datasets.

For existing datasets, "Normalize columns" is checked:

When checked, the behavior is consistent with how it was previously, i.e. physical datasets on Snowflake have normalized column names:

For new datasets, the checkbox is unchecked:

In this case, a physical dataset on Snowflake will denormalize the columns, usually showing them as UPPERCASE:

This means, that for new datasets, column names are no longer normalized, unless the flag is checked.

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

john-bodley · 2023-08-14T19:19:38Z

...set/migrations/versions/2023-08-14_09-38_9f4a086c2676_add_normalize_columns_to_sqla_model.py

+def upgrade():
+    op.add_column(
+        "tables",
+        sa.Column("normalize_columns", sa.Boolean(), nullable=True, default=False),


What's the difference between NULL and FALSE?

@john-bodley they're essentially the same. Would you prefer I change it to just nullable=True without a default value, or just have the default value?

If we only need two states, then I would stick with TRUE and FALSE, i.e., non-nullable, unless there's a performance or storage cost for using FALSE rather than NULL—which likely will be the predominant value.

It seems SQLAlchemy is slightly flaky when it comes to assigning default values with the NULL constraint in place. I dug around, and found that the is_sqllab_viz flag on the SqlaTable model is expected to work similarly, and there the migration also had to allow for nullable=True. So reverting back to that.

eschutho · 2023-08-14T23:48:03Z

tests/integration_tests/datasource_tests.py

@@ -105,6 +105,7 @@ def test_external_metadata_by_name_for_physical_table(self):
                "database_name": tbl.database.database_name,
                "schema_name": tbl.schema,
                "table_name": tbl.table_name,
+                "normalize_columns": tbl.normalize_columns,


Can we add some tests that create a dataset with/without this value to check for api backwards compatibility?

Good idea, I'll do that 👍

michael-s-molina

LGTM. Thank you for the fix @villebro!

eschutho

Looks great!

(cherry picked from commit f94dc49)

villebro requested a review from a team as a code owner August 14, 2023 17:57

pull-request-size bot added the size/M label Aug 14, 2023

villebro force-pushed the villebro/snowflake-normalize branch from 254921a to 1fa6d86 Compare August 14, 2023 17:58

villebro requested review from eschutho and michael-s-molina August 14, 2023 18:01

villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 87fed4b to 04575d3 Compare August 14, 2023 18:30

michael-s-molina requested a review from john-bodley August 14, 2023 18:51

villebro force-pushed the villebro/snowflake-normalize branch from 04575d3 to c7a7656 Compare August 14, 2023 18:54

fix(snowflake): opt-in denormalization of column names

a6346db

villebro force-pushed the villebro/snowflake-normalize branch from c7a7656 to a6346db Compare August 14, 2023 19:15

pull-request-size bot added size/L and removed size/M labels Aug 14, 2023

john-bodley reviewed Aug 14, 2023

View reviewed changes

villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 4f5c21e to 232f9b4 Compare August 14, 2023 23:38

eschutho reviewed Aug 14, 2023

View reviewed changes

villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 2765dca to d3d8655 Compare August 14, 2023 23:57

make new column non-nullable

ddee4b3

villebro force-pushed the villebro/snowflake-normalize branch from d3d8655 to ddee4b3 Compare August 15, 2023 02:12

lint

7376203

villebro force-pushed the villebro/snowflake-normalize branch 4 times, most recently from 1a715e6 to fc593d2 Compare August 15, 2023 14:42

fix tests

114e867

villebro force-pushed the villebro/snowflake-normalize branch from fc593d2 to 114e867 Compare August 15, 2023 15:02

add API test

af500e2

michael-s-molina approved these changes Aug 15, 2023

View reviewed changes

add UPDATING note and improve description

8847cbd

villebro requested a review from betodealmeida August 15, 2023 22:35

eschutho approved these changes Aug 15, 2023

View reviewed changes

villebro merged commit f94dc49 into apache:master Aug 15, 2023

villebro deleted the villebro/snowflake-normalize branch August 16, 2023 01:21

michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 16, 2023

michael-s-molina pushed a commit that referenced this pull request Aug 16, 2023

fix(snowflake): opt-in denormalization of column names (#24982)

387549f

jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Aug 16, 2023

fix(snowflake): opt-in denormalization of column names (apache#24982)

bf895e2

(cherry picked from commit f94dc49)

villebro mentioned this pull request Nov 25, 2023

[SIP-84] Case-insensitive handling of datasets' column names #19773

Closed

Vitor-Avila mentioned this pull request Dec 6, 2023

Chart filter options are not populated for datasets with normalize column enabled #26198

Closed

3 tasks

mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024

fix(snowflake): opt-in denormalization of column names (apache#24982)

f85f6d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snowflake): opt-in denormalization of column names #24982

fix(snowflake): opt-in denormalization of column names #24982

villebro commented Aug 14, 2023 •

edited

Loading

john-bodley Aug 14, 2023

villebro Aug 14, 2023

john-bodley Aug 14, 2023

michael-s-molina Aug 15, 2023

villebro Aug 15, 2023

eschutho Aug 14, 2023

villebro Aug 15, 2023

michael-s-molina Aug 15, 2023

michael-s-molina left a comment

eschutho left a comment

fix(snowflake): opt-in denormalization of column names #24982

fix(snowflake): opt-in denormalization of column names #24982

Conversation

villebro commented Aug 14, 2023 • edited Loading

SUMMARY

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-s-molina left a comment

Choose a reason for hiding this comment

eschutho left a comment

Choose a reason for hiding this comment

villebro commented Aug 14, 2023 •

edited

Loading