Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Lineage - sqllineage can't handle some #-comments with commas inside SQL DDL statements #4623

Closed
vgaidass opened this issue Apr 8, 2022 · 0 comments · Fixed by #4662
Labels
bug Bug report

Comments

@vgaidass
Copy link
Contributor

vgaidass commented Apr 8, 2022

Describe the bug
During BigQuery metadata ingestion from exported audit tables, the process fails with:
image

To Reproduce
Steps to reproduce the behaviour:

  1. Execute a SQL CREATE AS SELECT statement with a comment that:
    • is located right before SELECT
    • starts with "#" (allowed symbol for comment blocks in BigQuery) and has no space after "#"
    • has a comma somewhere closer to the start of the comment
/* 
HERE IS A STANDARD COMMENT BLOCK
THIS WILL NOT BREAK sqllineage
*/
CREATE OR REPLACE TABLE `foo.bar.trg_tbl`AS
#This, comment will break sqllineage
SELECT foo
-- this comment will not break sqllineage
# this comment will not break sqllineage either
FROM `foo.bar.src_tbl`  
  1. Have exported audit logs table prepared in BigQuery
  2. Execute a BigQuery recipe with use_exported_bigquery_audit_metadata: true (version: 0.8.32.4)

Expected behavior
sql_lineage_parser_impl.py and sqllineage should strip even #-comments from SQL statement

Additional context
I've tried to parse the SQL statement from above in https://sqllineage.herokuapp.com/.
If comma is being placed too far or the comment section ends with a comma, then the statement is being parsed correctly.

If the comment section begins with "#" but has a space sign before the comment itself, then the statement is also being parsed correctly.

The only possible solution seems to be converting all "#" to "--" (the standard SQL comment symbol) as dealing with "#" itself might require asking sqllineage developers for some assistance.

The issue seems to be quite random. Unfortunately, we cannot control where developers will place their comments in BigQuery SQL and what symbols they'll use :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant