-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(schema): don't specialize type variable in MappingSchema #2394
Conversation
Hmm I guess this is breaking because mypy may yell if a current user of |
class MappingSchema(AbstractMappingSchema[t.Dict[str, str]], Schema): | ||
class MappingSchema(AbstractMappingSchema, Schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @barakalon can you remind me why this was t.Dict[str, str]
to begin with? It seems like it was first introduced in f50b2918b but I don't remember the context.
We clearly allow both str
and DataType
(e.g. we have code that considers these two cases for the columns' types, see get_column_type
), so an alternative I considered was updating this type to be t.Dict[str, str | DataType]
, but this would then be breaking and the user would have to explicitly cast the result of AbstractMappingSchema.find
, which is annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably no good reason. Schema pre-exists a lot of maturation of DataType.
removing the type parameter implicitly makes this AbstractMappingSchema[Any]. Is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully sure; my reasoning was that being more strict by specifying the type as a union of str
and DataType
would be more annoying for users because they'd have to account for both cases. I'm also not sure if allowing str
column types instead of converting them to DataType
s is helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place that type parameter is used: https://github.com/tobymao/sqlglot/blob/main/sqlglot/schema.py#L165
and it seems like that’s meant to be a protected method. So maybe just remove the generic altogether? Might not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove it altogether too, I don't see much value in it if we don't specialize the type anyway.
it seems like that’s meant to be a protected method
Not sure about this one though, e.g. we're using find
in SQLMesh as a way to get the columns-to-types mapping of a given table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do you coerce from string|DataType to DataType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we store the column types as DataType
values initially so it just returns them later on
In SQLMesh we use the
MappingSchema
to storeDataType
s and it works just fine. Not sure why we restricted the type hint to juststr
for the types to begin with.