-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49992][SQL] Default collation resolution for DDL and DML queries #48844
base: master
Are you sure you want to change the base?
[SPARK-49992][SQL] Default collation resolution for DDL and DML queries #48844
Conversation
@@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.util.CollationFactory | |||
* The id of collation for this StringType. | |||
*/ | |||
@Stable | |||
class StringType private (val collationId: Int) extends AtomicType with Serializable { | |||
class StringType private[sql] (val collationId: Int) extends AtomicType with Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to do this, as we have def apply(collationId: Int)
in object StringType
@@ -43,7 +43,7 @@ import org.apache.spark.sql.types._ | |||
case class CreateTable( | |||
tableDesc: CatalogTable, | |||
mode: SaveMode, | |||
query: Option[LogicalPlan]) extends LogicalPlan { | |||
query: Option[LogicalPlan]) extends LogicalPlan with V1DDLCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works. We need to resolve string collation in tableDesc.schema
, and the rule must match CreateTable
directly to do it.
import org.apache.spark.sql.internal.SqlApiConf | ||
import org.apache.spark.sql.types.StringType | ||
|
||
class DefaultCollationTestSuite extends DatasourceV2SQLBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this test cover v1 commands as well?
What changes were proposed in this pull request?
This PR proposes not using session-level collation in DDL commands (create/alter view/table, add/replace columns).
Also, resolution of default collation should happen in the analyzer and not in the parser. However, due to how we are checking for default string type (using reference equals with
StringType
object) we cannot just replace this object withStringType("UTF8_BINARY")
because they compare as equal so the tree node framework will just return the old plan. Because of this we have to perform this resolution twice, once by changing theStringType
object into aTemporaryStringType
and then back toStringType("UTF8_BINARY")
which is not considered a default string type anymore.Why are the changes needed?
The default collation for DDL commands should be tied to the object being created or altered (e.g., table, view, schema) rather than the session-level setting. Since object-level collations are not yet supported, we will assume the UTF8_BINARY collation by default for now.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.