Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-1581] UDTF like in hive #1138

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 28 additions & 5 deletions core/src/main/codegen/templates/Parser.jj
Original file line number Diff line number Diff line change
Expand Up @@ -1728,15 +1728,38 @@ List<SqlNode> SelectList() :
SqlNode SelectItem() :
{
SqlNode e;
final SqlIdentifier id;
SqlIdentifier columnAlias;
final List<SqlNode> columnAliases = new ArrayList();
final Span s = span();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename ids to columnAliases and id to columnAlias ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you suggestions.

}
{
e = SelectExpression()
[
[ <AS> ]
id = SimpleIdentifier() {
e = SqlStdOperatorTable.AS.createCall(span().end(e), e, id);
}
(
LOOKAHEAD(2) (
[ <AS> ]
columnAlias = SimpleIdentifier()
{
e = SqlStdOperatorTable.AS.createCall(s.end(e), e, columnAlias);
}
)
|
(
<AS> <LPAREN>
columnAlias = SimpleIdentifier() {
columnAliases.add(columnAlias);
}
( <COMMA> columnAlias = SimpleIdentifier() { columnAliases.add(columnAlias);} )*
<RPAREN> {
if (!this.conformance.allowSelectTableFunction()) {
throw SqlUtil.newContextException(getPos(),
RESOURCE.notAllowTableFunctionInSelect());
}
e = SqlStdOperatorTable.AS.createCall(s.end(e), e,
new SqlNodeList(columnAliases, s.end(e)));
}
)
)
]
{
return e;
Expand Down
12 changes: 12 additions & 0 deletions core/src/main/java/org/apache/calcite/runtime/CalciteResource.java
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,18 @@ ExInst<CalciteException> invalidTypesForComparison(String clazzName0, String op,

@BaseMessage("Not a valid input for REGEXP_REPLACE: ''{0}''")
ExInst<CalciteException> invalidInputForRegexpReplace(String value);

@BaseMessage("Table function is not allowed in select list in current SQL conformance level")
ExInst<SqlValidatorException> notAllowTableFunctionInSelect();

@BaseMessage("''{0}'' should be a table function")
ExInst<SqlValidatorException> exceptTableFunction(String name);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptTableFunction -> notTableFunctionError ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this exception means we except a table function for func in select func() as (t0).

@BaseMessage("Only one table function is allowed in select list")
ExInst<SqlValidatorException> onlyOneTableFunctionAllowedInSelect();

@BaseMessage("Table function is not allowed in aggregate statement")
ExInst<SqlValidatorException> notAllowTableFunctionInAggregate();
}

// End CalciteResource.java
25 changes: 20 additions & 5 deletions core/src/main/java/org/apache/calcite/sql/SqlAsOperator.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.calcite.sql.validate.SqlValidatorScope;
import org.apache.calcite.util.Util;

import java.util.ArrayList;
import java.util.List;

import static org.apache.calcite.util.Static.RESOURCE;
Expand Down Expand Up @@ -102,12 +103,26 @@ public void validateCall(
// we don't want to validate the identifier.
final List<SqlNode> operands = call.getOperandList();
assert operands.size() == 2;
assert operands.get(1) instanceof SqlIdentifier;
operands.get(0).validateExpr(validator, scope);
SqlIdentifier id = (SqlIdentifier) operands.get(1);
if (!id.isSimple()) {
throw validator.newValidationError(id,
RESOURCE.aliasMustBeSimpleIdentifier());

SqlNode asIdentifier = operands.get(1);
assert asIdentifier instanceof SqlIdentifier
|| (asIdentifier instanceof SqlNodeList
&& validator.getConformance().allowSelectTableFunction());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should throw exception if the conformance does not allow table function instead of using assert.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a check for this in the Parser.jj, So this should not happen here. I use the assert but not the exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple rule of thumb. If you throw an exception (as opposed to using assert) then you must add a test for that exception. If the parser has already checked, and ensured that the condition is impossible, then you will find it impossible to write a test case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @julianhyde . There is a test case in SqlParserTest for this exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. In validateSelectList, this change has already rewriten the select items by handleTableFunctionInSelect, how is it possible that SqlAsOperator does validation here ?
Can you explain or give a test that the assertion can be failed ?

List<SqlNode> ids = new ArrayList<>();
if (asIdentifier instanceof SqlIdentifier) {
ids.add(operands.get(1));
} else {
ids.addAll(((SqlNodeList) operands.get(1)).getList());
}
for (int i = 0; i < ids.size(); i++) {
assert ids.get(i) instanceof SqlIdentifier;
SqlIdentifier id = (SqlIdentifier) ids.get(i);
if (!id.isSimple()) {
throw validator.newValidationError(id,
RESOURCE.aliasMustBeSimpleIdentifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent -- 4 spaces

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion.

}
}
}

Expand Down
15 changes: 15 additions & 0 deletions core/src/main/java/org/apache/calcite/sql/SqlUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -1143,6 +1143,21 @@ private void visitChild(SqlNode node) {
return check(type);
}
}

/**
* Whether the selectItem is a table function node in the select.
* eg. "select table_func(1) as (f0,f1)"
*
* @param selectItem select item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should leave a empty line before @param. Other java docs have same issue. Please format it.

* @return true if this selectItem is a table function call
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be true if this.... instead of true is this.... Right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for correct me.

public static boolean isTableFunctionInSelect(SqlNode selectItem) {
if (selectItem.getKind() == SqlKind.AS) {
SqlBasicCall call = (SqlBasicCall) selectItem;
return call.getOperands()[1] instanceof SqlNodeList;
}
return false;
}
}

// End SqlUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import org.apache.calcite.sql.SqlIdentifier;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.SqlNodeList;
import org.apache.calcite.sql.SqlUtil;
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
import org.apache.calcite.sql.parser.SqlParserPos;
import org.apache.calcite.util.Util;
Expand Down Expand Up @@ -69,7 +70,12 @@ protected RelDataType validateImpl(RelDataType targetRowType) {
final SqlValidatorNamespace childNs =
validator.getNamespace(operands.get(0));
final RelDataType rowType = childNs.getRowTypeSansSystemColumns();
final List<SqlNode> columnNames = Util.skip(operands, 2);
List<SqlNode> columnNames;
if (SqlUtil.isTableFunctionInSelect(call)) {
columnNames = ((SqlNodeList) operands.get(1)).getList();
} else {
columnNames = Util.skip(operands, 2);
}
for (final SqlNode operand : columnNames) {
String name = ((SqlIdentifier) operand).getSimple();
if (nameList.contains(name)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,10 @@ public boolean allowExtendedTrim() {
public boolean allowPluralTimeUnits() {
return SqlConformanceEnum.DEFAULT.allowPluralTimeUnits();
}

public boolean allowSelectTableFunction() {
return SqlConformanceEnum.DEFAULT.allowSelectTableFunction();
}
}

// End SqlAbstractConformance.java
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ public interface SqlConformance {
@Deprecated // to be removed before 2.0
SqlConformanceEnum PRAGMATIC_2003 = SqlConformanceEnum.PRAGMATIC_2003;

SqlConformanceEnum HIVE = SqlConformanceEnum.HIVE;
/**
* Whether this dialect supports features from a wide variety of
* dialects. This is enabled for the Babel parser, disabled otherwise.
Expand Down Expand Up @@ -408,6 +409,13 @@ public interface SqlConformance {
* false otherwise.
*/
boolean allowPluralTimeUnits();

/**
* Whether SELECT can contain a table function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SELECT -> select list ?

* <p>For example, consider the query
* <blockquote><pre> SELECT SPLIT(col) AS (F0, F1) FROM A </pre> </blockquote>
*/
boolean allowSelectTableFunction();
}

// End SqlConformance.java
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,11 @@ public enum SqlConformanceEnum implements SqlConformance {

/** Conformance value that instructs Calcite to use SQL semantics
* consistent with Microsoft SQL Server version 2008. */
SQL_SERVER_2008;
SQL_SERVER_2008,

/** Conformance value that instructs Calcite to use SQL semantics
* consistent with Hive version. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the word of version or specify the version.

HIVE;

public boolean isLiberal() {
switch (this) {
Expand Down Expand Up @@ -314,6 +318,14 @@ public boolean allowExtendedTrim() {
}
}

public boolean allowSelectTableFunction() {
switch (this) {
case HIVE:
return true;
default:
return false;
}
}
}

// End SqlConformanceEnum.java
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ protected SqlDelegatingConformance(SqlConformance delegate) {
return delegate.allowNiladicParentheses();
}

@Override public boolean allowSelectTableFunction() {
return delegate.allowSelectTableFunction();
}
}

// End SqlDelegatingConformance.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.apache.calcite.runtime.CalciteContextException;
import org.apache.calcite.runtime.CalciteException;
import org.apache.calcite.runtime.Resources;
import org.apache.calcite.sql.SqlBasicCall;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlDataTypeSpec;
import org.apache.calcite.sql.SqlDelete;
Expand Down Expand Up @@ -749,6 +750,14 @@ CalciteException handleUnresolvedFunction(SqlCall call,
*/
SqlValidatorScope getOverScope(SqlNode node);

/**
* Returns the table function SqlBasicCall in SqlSelect.
*
* @param select The select node
* @return The table function node associate with the select node
*/
SqlBasicCall getTableFunctionInSelect(SqlSelect select);

/**
* Validates that a query is capable of producing a return of given modality
* (relational or streaming).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,9 @@ public class SqlValidatorImpl implements SqlValidatorWithHints {
// Flag saying if we enable the implicit type coercion.
private boolean enableTypeCoercion;

// Mapping the table function and the select node.
private final Map<SqlSelect, SqlBasicCall> selectTableFunctions =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it -> its?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding.

new IdentityHashMap<>();
//~ Constructors -----------------------------------------------------------

/**
Expand Down Expand Up @@ -1067,6 +1070,10 @@ public SqlValidatorScope getOverScope(SqlNode node) {
return scopes.get(node);
}

public SqlBasicCall getTableFunctionInSelect(SqlSelect select) {
return selectTableFunctions.get(select);
}

private SqlValidatorNamespace getNamespace(SqlNode node,
SqlValidatorScope scope) {
if (node instanceof SqlIdentifier && scope instanceof DelegatingScope) {
Expand Down Expand Up @@ -4137,6 +4144,14 @@ protected RelDataType validateSelectList(
expandedSelectItems,
aliases,
fieldList);
} else if (SqlUtil.isTableFunctionInSelect(selectItem)) {
handleTableFunctionInSelect(
select,
(SqlBasicCall) selectItem,
expandedSelectItems,
aliases,
fieldList
);
} else {
expandSelectItem(
selectItem,
Expand Down Expand Up @@ -4247,6 +4262,114 @@ private void handleScalarSubQuery(
fieldList.add(Pair.of(alias, nodeType));
}

/**
* Process table function found in select list.Checks that is
* actually a table function and validate the table function
* count in Select list.
*
* @param parentSelect Base SqlSelect
* @param selectItem Child select items from select list
* @param expandedSelectItems Select items after processing
* @param aliases Built from user or system values
* @param fields Built up entries for each select list entry
*/
private void handleTableFunctionInSelect(
SqlSelect parentSelect,
SqlBasicCall selectItem,
List<SqlNode> expandedSelectItems,
Set<String> aliases,
List<Map.Entry<String, RelDataType>> fields) {
SqlBasicCall functionCall = (SqlBasicCall) selectItem.getOperands()[0];
SqlFunction function = (SqlFunction) functionCall.getOperator();
// Check whether there are more than one table function in select list.
for (SqlNode item : parentSelect.getSelectList()) {
if (SqlUtil.isTableFunctionInSelect(item)
&& item != selectItem) {
throw newValidationError(parentSelect.getSelectList(),
RESOURCE.onlyOneTableFunctionAllowedInSelect());
}
}

// Change the function category to USER_DEFINED_TABLE_FUNCTION.
// It is because that in sql-select list, the SqlFunctionCategory is USER_DEFINED_FUNCTION
// for a SqlUnresolvedFunction,so we should do this change.
if (function instanceof SqlUnresolvedFunction) {
if (!function.getFunctionType().isTableFunction()) {
SqlFunction newFunction =
new SqlUnresolvedFunction(function.getNameAsId(),
function.getReturnTypeInference(),
function.getOperandTypeInference(),
function.getOperandTypeChecker(),
function.getParamTypes(),
SqlFunctionCategory.USER_DEFINED_TABLE_FUNCTION);
functionCall.setOperator(newFunction);
function = newFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more about this transformation? I am a little confused.

}
}
// Check functionCall whether is a table function
List<SqlOperator> overloads = new ArrayList<>();
opTab.lookupOperatorOverloads(function.getNameAsId(),
function.getFunctionType(),
function.getSyntax(), overloads, catalogReader.nameMatcher());
if (overloads.size() == 0) {
throw newValidationError(functionCall,
RESOURCE.exceptTableFunction(function.getName()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overloads.size() == 0 might mean that there is no UDTF called function.getName() . Therefore it is not proper if the error message is ... should be a table function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this happened in the case "select func() as (t0,t1)", the func should be a table function, if not, an exception throws out.

// Check the parent select whether is a aggregate statement
if (isAggregate(parentSelect)) {
throw newValidationError(functionCall,
RESOURCE.notAllowTableFunctionInAggregate());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this check at the beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting the isAggregate(parentSelect) at the beginning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this check only happened after the check of table function exists in the select statement.So It cannot be placed somewhere before here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks.

SqlNodeList aliasNodes
= (SqlNodeList) selectItem.getOperands()[1];
List<String> aliasList = new ArrayList<>();
for (SqlNode aliasNode : aliasNodes) {
aliasList.add(deriveAlias(aliasNode, aliasList.size()));
}
aliases.addAll(aliasList);

String tableAlias = functionCall.getOperator().getName();
// Expand the table function alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the function name as the alias prefix, the TABLE_FUNCTION_PREFIX seems unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. done!

for (String alias : aliasList) {
SqlIdentifier id = new SqlIdentifier(Lists.newArrayList
(tableAlias, alias), SqlParserPos.ZERO);
expandedSelectItems.add(id);
}

// Register namespace for table function
SqlValidatorScope fromScope = getFromScope(parentSelect);
ProcedureNamespace tableNs = new ProcedureNamespace(this,
fromScope, functionCall, selectItem);
tableNs.validateImpl(unknownType);
registerNamespace(null, null,
tableNs, false);
AliasNamespace aliasNs = new AliasNamespace(this,
selectItem, parentSelect);
aliasNs.validateImpl(unknownType);
registerNamespace(getSelectScope(parentSelect),
tableAlias, aliasNs, false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the format of these pieces of code is not expected.

// Create a table scope for table function
TableScope tableScope = new TableScope(fromScope, parentSelect);
if (fromScope instanceof ListScope) {
for (ScopeChild child : ((ListScope) fromScope).children) {
tableScope.addChild(child.namespace, child.name, child.nullable);
}
}
scopes.put(functionCall, tableScope);
// Associate the select with the table function
selectTableFunctions.put(parentSelect, functionCall);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical about this member selectTableFunctions, the SqlValidator never has such mapping, it handles all the states to the "Namespace" and "scope", i'm wondering if we can move this mapping into each scope, because each select has a scope. Thus, we can keep the validator code clean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think selectTableFunctions is the same to the whereScopes or groupByScopes in the SqlValidatorImpl.

RelDataType type = aliasNs.getRowType();
setValidatedNodeType(selectItem, type);
for (int i = 0; i < aliasList.size(); i++) {
fields.add(
Pair.of(
aliasList.get(i), type.getFieldList()
.get(i).getType()));
}
}

/**
* Derives a row-type for INSERT and UPDATE operations.
*
Expand Down
Loading