Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] information_schema.COLUMNS support complex type #33431

Merged
merged 2 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions be/src/exec/schema_scanner/schema_columns_scanner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -322,8 +322,12 @@ Status SchemaColumnsScanner::fill_chunk(ChunkPtr* chunk) {
// DATA_TYPE
{
ColumnPtr column = (*chunk)->get_column_by_slot_id(8);
std::string str = to_mysql_data_type_string(_desc_result.columns[_column_index].columnDesc);
Slice value(str.c_str(), str.length());
std::string value;
if (_desc_result.columns[_column_index].columnDesc.__isset.dataType) {
value = _desc_result.columns[_column_index].columnDesc.dataType;
} else {
value = to_mysql_data_type_string(_desc_result.columns[_column_index].columnDesc);
}
fill_column_with_slot<TYPE_VARCHAR>(column.get(), (void*)&value);
}
break;
Expand Down Expand Up @@ -420,7 +424,12 @@ Status SchemaColumnsScanner::fill_chunk(ChunkPtr* chunk) {
// COLUMN_TYPE
{
ColumnPtr column = (*chunk)->get_column_by_slot_id(16);
std::string value = type_to_string(_desc_result.columns[_column_index].columnDesc);
std::string value;
if (_desc_result.columns[_column_index].columnDesc.__isset.columnTypeStr) {
value = _desc_result.columns[_column_index].columnDesc.columnTypeStr;
} else {
value = type_to_string(_desc_result.columns[_column_index].columnDesc);
}
fill_column_with_slot<TYPE_VARCHAR>(column.get(), (void*)&value);
}
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes appear correct and improve the program by checking if dataType or columnTypeStr are set before attempting to use them. However, some improvements can be made to enhance readability, maintainability, and safety.

  1. Use meaningful variable names:

Consider renaming _desc_result to something more descriptive that reflects what this variable holds.

  1. Check for null pointers:

While it seems that the dereferenced values may not be null based on the given context, it would add robustness to your code to check for null references on (*chunk)->get_column_by_slot_id(8) and (*chunk)->get_column_by_slot_id(16) calls.

  1. Avoid C-style casting of std::string in C++:

The (void*)&value style cast from std::string to void* isn't a good practice in modern C++. What you want to pass should be the internal char array of the string, so use value.c_str() instead.

  1. Extend the comments:

Consider providing more explanation in their comments above each section of code. This makes it easier for others (and yourself in the future) to understand the purpose of this piece of code.

  1. Code duplication:

The block of code that extracts values out with the fall back to to_mysql_data_type_string/type_to_string method for default value looks identical except for specific fields/accessors. This could potentially be abstracted into a function to comply with the DRY (Don't Repeat Yourself) principle.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,15 @@ public boolean isBooleanType() {
public boolean isNullTypeItem() {
return itemType.isNull();
}

public String toMysqlDataTypeString() {
return "array";
}

// This implementation is the same as BE schema_columns_scanner.cpp type_to_string
public String toMysqlColumnTypeString() {
return toSql();
}
}


9 changes: 9 additions & 0 deletions fe/fe-core/src/main/java/com/starrocks/catalog/MapType.java
Original file line number Diff line number Diff line change
Expand Up @@ -208,5 +208,14 @@ public MapType deserialize(JsonElement jsonElement, java.lang.reflect.Type type,
return new MapType(keyType, valueType);
}
}

public String toMysqlDataTypeString() {
return "map";
}

// This implementation is the same as BE schema_columns_scanner.cpp type_to_string
public String toMysqlColumnTypeString() {
return toSql();
}
}

30 changes: 30 additions & 0 deletions fe/fe-core/src/main/java/com/starrocks/catalog/ScalarType.java
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ public String toSql(int depth) {
case DATETIME:
case HLL:
case BITMAP:
case BINARY:
case PERCENTILE:
case JSON:
case FUNCTION:
Expand Down Expand Up @@ -845,4 +846,33 @@ public String canonicalName() {
return toString();
}
}

// This implementation is the same as BE schema_columns_scanner.cpp to_mysql_data_type_string
public String toMysqlDataTypeString() {
switch (type) {
case BOOLEAN:
return "tinyint";
case LARGEINT:
return "bigint unsigned";
case DECIMAL32:
case DECIMAL64:
case DECIMAL128:
case DECIMALV2:
return "decimal";
default:
return type.toString().toLowerCase();
}
}

// This implementation is the same as BE schema_columns_scanner.cpp type_to_string
public String toMysqlColumnTypeString() {
switch (type) {
case BOOLEAN:
return "tinyint(1)";
case LARGEINT:
return "bigint(20) unsigned";
default:
return toSql();
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is generally well written. It's formatted consistently with a lot of alignment, indicating it likely adheres to your style guide. There are a few suggestions that could help improve the code:

  1. Handle enum type more explicitly: In both toMysqlDataTypeString() and toMysqlColumnTypeString() methods, you provided a 'default' case to handle all other types that are not explicitly called out. This loose handling might not always be best because if the internal implementation changes and extra types were added, they would all come into the 'default' branch thus possibly causing issues.

    default:
        return type.toString().toLowerCase();

    You could add an explicit handling for each known type and a default throw for unknown types.

  2. Use toString() directly in default case of new methods: You're calling .toLowerCase() on the enumeration name. The ENUM.name() method returns the name of this enum constant, exactly as declared in its enum declaration. So, the .toLowerCase() seems redundant unless there's a specific use-case to cover.

  3. Add comments to explain method logic and branches: Your existing comments are descriptive but don’t really describe what the method does or why certain steps are necessary within the function. They seem to reference another implementation spanning different files, which might raise maintainability concerns in the long run.

  4. Use of value-specific mappings: Depending on how complex your system is or can become, it may be more maintainable to use a Map to store these explicit conversions instead of a switch block.

  5. Unit tests: It's ideal to also include testing for these methods to ensure correctness and prevent potential regressions. If tests exist elsewhere, they should be updated to cover these changes too.

Original file line number Diff line number Diff line change
Expand Up @@ -317,5 +317,14 @@ public StructType deserialize(JsonElement jsonElement, java.lang.reflect.Type ty
return new StructType(structFields, isNamed);
}
}

public String toMysqlDataTypeString() {
return "struct";
}

// This implementation is the same as BE schema_columns_scanner.cpp type_to_string
public String toMysqlColumnTypeString() {
return toSql();
}
}

10 changes: 10 additions & 0 deletions fe/fe-core/src/main/java/com/starrocks/catalog/Type.java
Original file line number Diff line number Diff line change
Expand Up @@ -1776,4 +1776,14 @@ public static Type getInnermostType(Type type) throws AnalysisException {
public String canonicalName() {
return toString();
}

// This is used for information_schema.COLUMNS DATA_TYPE
public String toMysqlDataTypeString() {
return "unknown";
}

// This is used for information_schema.COLUMNS COLUMN_TYPE
public String toMysqlColumnTypeString() {
return "unknown";
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review is based on the code snippet provided:

  1. Comments: The comments should be more informative, describing not just where the methods are used but also what they do. E.g., // This method returns the MySQL data type string for information_schema.COLUMNS DATA_TYPE

  2. Constants: If "unknown" is a fallback value that could potentially be used in other places inside your codebase as well, consider making it a constant, e.g., public static final String UNKNOWN = "unknown";. Using a constant improves readability and makes your codebase easier to maintain.

  3. Method purpose: Right now both toMysqlDataTypeString() and toMysqlColumnTypeString() methods are returning "unknown". Without additional context, this suggests that either these methods aren't fully implemented yet or their purpose isn't clear. Before these methods get used, ensure they serve their intended purpose and update the return statement with proper dynamic values instead of a hardcoded "unknown".

  4. Unit Tests: These added methods should have corresponding unit tests that validate correct behavior.

  5. Follow Code Conventions: Depending on the codebase's established policies or the language style guide used (Java in this case), make sure you're consistent in following those conventions. For example, you'd typically find a single line between methods in most Java style guides.

Again this review assumes absent elements such as exceptions handling etc., are handled elsewhere in your codebase.

Original file line number Diff line number Diff line change
Expand Up @@ -1096,6 +1096,8 @@ private boolean setColumnDesc(List<TColumnDef> columns, Table table, long limit,
} else {
desc.setColumnKey("");
}
desc.setDataType(column.getType().toMysqlDataTypeString());
desc.setColumnTypeStr(column.getType().toMysqlColumnTypeString());
final TColumnDef colDef = new TColumnDef(desc);
final String comment = column.getComment();
if (comment != null) {
Expand Down
36 changes: 36 additions & 0 deletions fe/fe-core/src/test/java/com/starrocks/catalog/TypeTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,42 @@ public void testCanonicalName() {
}
}

@Test
public void testMysqlDataType() {
Object[][] testCases = new Object[][] {
{ScalarType.createType(PrimitiveType.BOOLEAN), "tinyint"},
{ScalarType.createType(PrimitiveType.LARGEINT), "bigint unsigned"},
{ScalarType.createDecimalV3NarrowestType(18, 4), "decimal"},
{new ArrayType(Type.INT), "array"},
{new MapType(Type.INT, Type.INT), "map"},
{new StructType(Lists.newArrayList(Type.INT)), "struct"},
};

for (Object[] tc : testCases) {
Type type = (Type) tc[0];
String name = (String) tc[1];
Assert.assertEquals(name, type.toMysqlDataTypeString());
}
}

@Test
public void testMysqlColumnType() {
Object[][] testCases = new Object[][] {
{ScalarType.createType(PrimitiveType.BOOLEAN), "tinyint(1)"},
{ScalarType.createType(PrimitiveType.LARGEINT), "bigint(20) unsigned"},
{ScalarType.createDecimalV3NarrowestType(18, 4), "decimal64(18, 4)"},
{new ArrayType(Type.INT), "array<int(11)>"},
{new MapType(Type.INT, Type.INT), "map<int(11),int(11)>"},
{new StructType(Lists.newArrayList(Type.INT)), "struct<col1 int(11)>"},
};

for (Object[] tc : testCases) {
Type type = (Type) tc[0];
String name = (String) tc[1];
Assert.assertEquals(name, type.toMysqlColumnTypeString());
}
}

@Test
public void testMapSerialAndDeser() {
// map<int,struct<c1:int,cc1:string>>
Expand Down
3 changes: 3 additions & 0 deletions gensrc/thrift/FrontendService.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ struct TColumnDesc {
23: optional string dbName
24: optional string tableName
25: optional string columnDefault
// Let FE control the type, which makes it easier to modify and display complex types
26: optional string columnTypeStr
27: optional string dataType
}

// A column definition; used by CREATE TABLE and DESCRIBE <table> statements. A column
Expand Down