Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix]Only when unsupported column is referred, exception will be thrown. #8881

Merged
merged 14 commits into from
Jul 28, 2022
Merged

[BugFix]Only when unsupported column is referred, exception will be thrown. #8881

merged 14 commits into from
Jul 28, 2022

Conversation

adzfolc
Copy link
Contributor

@adzfolc adzfolc commented Jul 18, 2022

What type of PR is this:

  • bug
  • feature
  • enhancement
  • refactor
  • others

Which issues of this PR fixes :

Fixes #8073

Problem Summary(Required) :

Table binarytest has two columns, col_int and col_binary, and only col_int is unsupported.
The goal is that, only when unsupported column is referred, exception will be thrown.

mysql> select * from binarytest;
ERROR 1064 (HY000): Column col_binary convert failed!
mysql> select col_binary from binarytest;
ERROR 1064 (HY000): Column col_binary convert failed!
mysql> select col_int from binarytest;
+---------+
| col_int |
+---------+
|     100 |
|     100 |
+---------+
2 rows in set (0.95 sec)

mysql> select col_int from binarytest where col_binary != null;
ERROR 1064 (HY000): Column col_binary convert failed!

…xception would be thrown

1. add `CONVERT_FAILED` type to primitive type
2. add hive unsupported column convert UT
3. add more code comments
@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 19, 2022

PTAL @DorianZheng @mofeiatwork @HangyuanLiu

@@ -156,6 +156,9 @@ public static boolean validateColumnType(String hiveType, Type type) {
primitiveType == PrimitiveType.VARCHAR;
case "BOOLEAN":
return primitiveType == PrimitiveType.BOOLEAN;
case "BINARY":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateColumnType is used to validate external table schema when creation, so we can just return fail when types are unsupported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateColumnType is used to validate external table schema when creation, so we can just return fail when types are unsupported

All queries related to external hive tables need to validate hive column, so if we return false when table column is unsupported, even if the query has no relation with unsupported type, will fail here.

@@ -210,6 +213,9 @@ public static Type convertColumnType(String hiveType) throws DdlException {
case "BOOLEAN":
primitiveType = PrimitiveType.BOOLEAN;
break;
case "BINARY":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only Binary. All unsupported types should be captured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only Binary. All unsupported types should be captured.

            case "BINARY":
            case "MAP":
            case "STRUCT":
            case "UNIONTYPE":
                primitiveType = PrimitiveType.UNKNOWN_TYPE;
                break;

Now complex types and binary type will all be converted to UNKNOWN_TYPE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not capture it in default branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@DorianZheng
Copy link
Contributor

@adzfolc Plz add more ut to cover more case

@HangyuanLiu
Copy link
Contributor

why not use INVALID_TYPE?

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 19, 2022

why not use INVALID_TYPE?

In my opinion, Invalid and convert failed have different semantics. Invalid is an invalid column, and invalid type is not allowed in any table; for convert failed column, it is allowed to have this type in a table, but it is not allowed to exist in q query, as long as there is this column in sql, then an error will be reported.

2. rename column type 'CONVERT_FAILED' to 'UNKNOWN_TYPE'
3. revert Be and thrift modification
@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 25, 2022

run starrocks_fe_unittest

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 25, 2022

@@ -149,6 +150,7 @@ public abstract class Type implements Cloneable {
.add(DECIMAL64)
.add(DECIMAL128)
.add(JSON)
.add(UNKNOWN_TYPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unknown type is added into supported types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unknown type is added into supported types?

Because when Fe analzy table, it need to resolve table. All hive types would be converted to sr types, including UNKNOWN_TYPE. In code, ScalarType.createType() will find UNKNOWN_TYPE in SUPPORTED_TYPES, if it is removed, exception will be thrown like below.

2022-07-26 10:47:24,907 ERROR (starrocks-mysql-nio-pool-1|150) [HiveMetaCache.getTable():315] Failed to get table [databaseName: old_test, tableName: binarytest]
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException: unknown type UNKNOWN_TYPE
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.get(LocalCache.java:3962) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3985) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4946) ~[spark-dpp-1.0.0.jar:?]
    at com.starrocks.external.hive.HiveMetaCache.getTable(HiveMetaCache.java:313) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetadata.getTable(HiveMetadata.java:61) ~[starrocks-fe.jar:?]
    at com.starrocks.server.MetadataMgr.lambda$getTable$1(MetadataMgr.java:114) ~[starrocks-fe.jar:?]
    at java.util.Optional.map(Optional.java:215) ~[?:1.8.0_321]
    at com.starrocks.server.MetadataMgr.getTable(MetadataMgr.java:114) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer.resolveTable(QueryAnalyzer.java:669) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer.access$100(QueryAnalyzer.java:66) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.resolveTableRef(QueryAnalyzer.java:235) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitSelect(QueryAnalyzer.java:161) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitSelect(QueryAnalyzer.java:83) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.ast.SelectRelation.accept(SelectRelation.java:206) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.process(QueryAnalyzer.java:88) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryRelation(QueryAnalyzer.java:106) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryStatement(QueryAnalyzer.java:100) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryStatement(QueryAnalyzer.java:83) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.ast.QueryStatement.accept(QueryStatement.java:36) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.process(QueryAnalyzer.java:88) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.QueryAnalyzer.analyze(QueryAnalyzer.java:76) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.Analyzer$AnalyzerVisitor.visitQueryStatement(Analyzer.java:204) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.Analyzer$AnalyzerVisitor.visitQueryStatement(Analyzer.java:68) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.ast.QueryStatement.accept(QueryStatement.java:36) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.ast.AstVisitor.visit(AstVisitor.java:104) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.Analyzer$AnalyzerVisitor.analyze(Analyzer.java:70) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.analyzer.Analyzer.analyze(Analyzer.java:65) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:43) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:346) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:273) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:391) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:624) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_321]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_321]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_321]
Caused by: java.lang.NullPointerException: unknown type UNKNOWN_TYPE
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:899) ~[spark-dpp-1.0.0.jar:?]
    at com.starrocks.catalog.ScalarType.createType(ScalarType.java:106) ~[starrocks-fe.jar:?]
    at com.starrocks.external.HiveMetaStoreTableUtils.convertColumnType(HiveMetaStoreTableUtils.java:229) ~[starrocks-fe.jar:?]
    at com.starrocks.external.HiveMetaStoreTableUtils.convertToSRTable(HiveMetaStoreTableUtils.java:247) ~[starrocks-fe.jar:?]
    at com.starrocks.external.hive.HiveMetaCache.loadTable(HiveMetaCache.java:323) ~[starrocks-fe.jar:?]
    at com.starrocks.external.hive.HiveMetaCache.access$700(HiveMetaCache.java:39) ~[starrocks-fe.jar:?]
    at com.starrocks.external.hive.HiveMetaCache$8.load(HiveMetaCache.java:144) ~[starrocks-fe.jar:?]
    at com.starrocks.external.hive.HiveMetaCache$8.load(HiveMetaCache.java:141) ~[starrocks-fe.jar:?]
    at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:192) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045) ~[spark-dpp-1.0.0.jar:?]
    ... 36 more

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change the convertColumnType method, not here. I think it's unreasonable we make the UNKNOWN_TYPE as the SUPPORTED_TYPES, but we actually don't support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change the convertColumnType method, not here. I think it's unreasonable we make the UNKNOWN_TYPE as the SUPPORTED_TYPES, but we actually don't support it.

In convertColumnType, I transfer hive column type to sr column type, so even if unsupported types like hive Binary should be given a sr column type.So, UNKNOWN_TYPE matches this case.So, UNKNOWN_TYPE should be one of PrimitiveType.It's truly counter-intuitive, but we still need it.

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 26, 2022

run starrocks_be_unittest

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 26, 2022

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 26, 2022

run starrocks_be_build

@@ -367,6 +378,7 @@ public abstract class Type implements Cloneable {
compatibilityMatrix[DATETIME.ordinal()][DECIMAL32.ordinal()] = PrimitiveType.DECIMAL32;
compatibilityMatrix[DATETIME.ordinal()][DECIMAL64.ordinal()] = PrimitiveType.DECIMAL64;
compatibilityMatrix[DATETIME.ordinal()][DECIMAL128.ordinal()] = PrimitiveType.DECIMAL128;
compatibilityMatrix[DATETIME.ordinal()][UNKNOWN_TYPE.ordinal()] = PrimitiveType.INVALID_TYPE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to add the UNKNOWN_TYPE to compatibilityMatrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to add the UNKNOWN_TYPE to compatibilityMatrix?

Because UNKNOWN_TYPE is added to PrimitiveType, and types will be compared with UNKNOWN_TYPE.

@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage Check]

😍 pass : 38 / 41 (92.68%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/external/HiveMetaStoreTableUtils.java 1 2 50.00% [209]
🔵 com/starrocks/sql/analyzer/Scope.java 2 3 66.67% [58]
🔵 com/starrocks/sql/analyzer/SelectAnalyzer.java 4 5 80.00% [199]
🔵 com/starrocks/catalog/PrimitiveType.java 2 2 100.00% []
🔵 com/starrocks/catalog/HiveTable.java 2 2 100.00% []
🔵 com/starrocks/catalog/ScalarType.java 2 2 100.00% []
🔵 com/starrocks/catalog/Type.java 25 25 100.00% []

@sonarcloud
Copy link

sonarcloud bot commented Jul 28, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@adzfolc
Copy link
Contributor Author

adzfolc commented Jul 28, 2022

@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Jul 28, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@imay imay merged commit b3c4d10 into StarRocks:main Jul 28, 2022
@adzfolc adzfolc deleted the feat-filter-unsupported-type branch July 29, 2022 07:00
imay pushed a commit that referenced this pull request Aug 2, 2022
Related to #8881 
When there are unsupported types in an Iceberg table, exception will not be thrown if users don't select columns with such types. Like the followings:
```sql
hive> select * from t.m;
OK
1	{"a":"b"}
MySQL [(none)]> select * from ice.t.m;
ERROR 1064 (HY000): External Table Column [m] convert failed, and column type is known!
MySQL [(none)]> select i from ice.t.m;
+------+
| i    |
+------+
|    1 |
+------+
1 row in set (0.08 sec)
```
@adzfolc
Copy link
Contributor Author

adzfolc commented Aug 17, 2022

@Mergifyio backport branch-2.3

@mergify
Copy link
Contributor

mergify bot commented Aug 17, 2022

backport branch-2.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Aug 17, 2022
…hrown. (#8881)

Table binarytest has two columns, col_int and col_binary, and only col_int is unsupported.
The goal is that, only when unsupported column is referred, exception will be thrown.

(cherry picked from commit b3c4d10)
dirtysalt pushed a commit that referenced this pull request Aug 18, 2022
* [BugFix]Only when unsupported column is referred, exception will be thrown. (#8881)

Table binarytest has two columns, col_int and col_binary, and only col_int is unsupported.
The goal is that, only when unsupported column is referred, exception will be thrown.

(cherry picked from commit b3c4d10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved Ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hive table which contains binary type unavailable througn hive connector
6 participants