InvalidProtocolBufferException with complex queries #31

goktugkose · 2023-11-24T12:09:09Z

Hi,
We are trying to employ ZetaSQL Toolkit for our SQL-based lineage and decision-making processes. However, with some complex queries (around 20 temp tables and a CTAS script that uses all of the temp tables), we encounter an error that prevents us from properly parsing some of the queries that we use. The code lines below represent a simplified version of our implementation in Java. During execution, the program throws the exception below.

It seems that the next() function in the AnalyzedStatement class analyzes the query while creating a communication channel between the client and BigQuery itself. It also seems that the CodedInputStream class in the protobuf-java library configures the limits of this communication channel. According to the error message that we got, the default value of defaultRecursionLimit which is 100, is not sufficient for our case. Could you help us to deal with this issue?

ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> it = analyzer.analyzeStatements(query, catalog);
while(it.hasNext()){
    ASTNodes.ASTStatement s = it.next().getParsedStatement();
}

Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
...
Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: INTERNAL: Invalid protobuf byte sequence
...
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message had too many levels of nesting.  
May be malicious.  Use CodedInputStream.setRecursionLimit() to increase the depth limit.

The text was updated successfully, but these errors were encountered:

ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation of ZetaSQL. By default, the serialization logic of that communication allows for a nesting depth in protobuf messages of up to 100. However, long queries can exceed that level of nesting and as a result cannot be analyzed by default. This implements a reflection-based patch that allows users to override that limit to a greater number. This is brittle by design and should be used with caution. Fixes #31

ppaglilla · 2024-05-06T17:08:51Z

We implemented a workaround for this issue in release 0.5.0. TLDR, call the following method before performing any analysis using ZetaSQL or the ZetaSQL Toolkit. This example sets the max nesting depth limit to 1000 (the default being 100).

ZetaSQLPatcher.patchMaxProtobufNestingDepth(1000);

For context, this happens because ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation. By default, grpc-java sets the max nesting depth for protos to 100 levels. Support for increasing that limit is experimental in grpc-java (source, tacking issue) and thus ZetaSQL compiles its GRPC client with the default limit of 100.

This feature of the toolkit monkey patches the ZetaSQL's GRPC client and modifies that limit to the value provided by the caller. Setting the limit to 1000 seems like a reasonable target and should be able to parse and analyze queries much larger than before.

However, do use this with caution and only when needed. This uses reflection to modify the internals of ZetaSQL's GRPC client at runtime. The approach is brittle by design and could potentially break in a later release. Consider the feature as experimental. In the future, once increasing the limit isn't experimental in grpc-java, it might be increased to a more sensible value in ZetaSQL itself and we might be able to drop this.

ppaglilla mentioned this issue May 5, 2024

Add reflection-based patching of GRPC's default max nesting depth #36

Merged

ppaglilla mentioned this issue May 6, 2024

Release v0.5.0 #37

Merged

ppaglilla closed this as completed in #37 May 6, 2024

ppaglilla closed this as completed in c8c24d9 May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InvalidProtocolBufferException with complex queries #31

InvalidProtocolBufferException with complex queries #31

goktugkose commented Nov 24, 2023 •

edited

Loading

ppaglilla commented May 6, 2024 •

edited

Loading

InvalidProtocolBufferException with complex queries #31

InvalidProtocolBufferException with complex queries #31

Comments

goktugkose commented Nov 24, 2023 • edited Loading

ppaglilla commented May 6, 2024 • edited Loading

goktugkose commented Nov 24, 2023 •

edited

Loading

ppaglilla commented May 6, 2024 •

edited

Loading