Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993

charsmith · 2023-08-04T19:37:48Z

Duckdb doesn't take a param for charset on decode/encode.

georgesittas

Thanks for the PR, left a few of comments.

I'm not sure if we should alter the base parser for this. I would personally just add a couple of entries in DuckDB's Parser.FUNCTIONS (see how it's done in presto.py), or update the presto dialect so that it routes to _parse_decode/encode in the base parser and reuse that in duckdb.

cc: @tobymao

sqlglot/dialects/duckdb.py

tests/dialects/test_duckdb.py

sqlglot/parser.py

sqlglot/dialects/duckdb.py

charsmith · 2023-08-04T22:42:12Z

sqlglot/dialects/dialect.py

+            self.unsupported(f"Unsupported charset {charset}")
+
+        expression = expression.copy()
+        del expression.args["charset"]


I don't love this, happy to change this if there is a better way.

I think you can just do return self.func(name, expression.this, expression.args.get("replace") if replace else None) (see other comment about replace flag).

georgesittas · 2023-08-04T22:54:33Z

sqlglot/dialects/dialect.py

+    if "charset" in expression.args:
+        charset = expression.args["charset"]


Suggested change

if "charset" in expression.args:

charset = expression.args["charset"]

charset = expression.args.get("charset")

if charset:

georgesittas · 2023-08-04T22:57:16Z

sqlglot/dialects/duckdb.py

+        FUNCTION_PARSERS = {
+            **parser.Parser.FUNCTION_PARSERS,
+            "ENCODE": lambda self: self._parse_encode_decode(exp.Encode),
+            "DECODE": lambda self: self._parse_encode_decode(exp.Decode),
+        }


We can just do these inline I think, like:

"DECODE": lambda self: self.expression( exp.Decode, this=self._parse_conjunction(), charset=exp.Literal.string("utf-8") ), "ENCODE": lambda self: self.expression( exp.Encode, this=self._parse_conjunction(), charset=exp.Literal.string("utf-8") ),

georgesittas · 2023-08-04T22:58:15Z

sqlglot/dialects/dialect.py

@@ -571,6 +571,20 @@ def datestrtodate_sql(self: Generator, expression: exp.DateStrToDate) -> str:
    return self.sql(exp.cast(expression.this, "date"))


+# Used for Presto and Duckdb which use functions that don't support charset, and assume utf-8
+def encode_decode_sql(self: Generator, expression: exp.Expression, name: str) -> str:


I think you also need a boolean flag here to control whether or not replace will be generated.

georgesittas · 2023-08-04T23:00:44Z

Looks good, just a few final comments.

georgesittas · 2023-08-04T23:09:17Z

@charsmith I'll wrap this up, should be just a couple of changes. Thanks!

Fix(parser, duckdb): decode/encode in duckdb don't take charset

d9f5b53

georgesittas reviewed Aug 4, 2023

View reviewed changes

sqlglot/dialects/duckdb.py Outdated Show resolved Hide resolved

sqlglot/dialects/duckdb.py Outdated Show resolved Hide resolved

tests/dialects/test_duckdb.py Show resolved Hide resolved

georgesittas reviewed Aug 4, 2023

View reviewed changes

sqlglot/parser.py Outdated Show resolved Hide resolved

tobymao reviewed Aug 4, 2023

View reviewed changes

sqlglot/dialects/duckdb.py Outdated Show resolved Hide resolved

tobymao reviewed Aug 4, 2023

View reviewed changes

sqlglot/dialects/duckdb.py Outdated Show resolved Hide resolved

charsmith added 2 commits August 4, 2023 18:37

refactor from code review

d81fe86

prefer Generator.unsupported

141faee

charsmith commented Aug 4, 2023

View reviewed changes

georgesittas reviewed Aug 4, 2023

View reviewed changes

georgesittas merged commit c9dd971 into tobymao:main Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993

Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993

charsmith commented Aug 4, 2023

georgesittas left a comment •

edited

Loading

charsmith Aug 4, 2023

georgesittas Aug 4, 2023 •

edited

Loading

georgesittas Aug 4, 2023

georgesittas Aug 4, 2023

georgesittas Aug 4, 2023

georgesittas commented Aug 4, 2023

georgesittas commented Aug 4, 2023

		if "charset" in expression.args:
		charset = expression.args["charset"]

Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993

Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993

Conversation

charsmith commented Aug 4, 2023

georgesittas left a comment • edited Loading

Choose a reason for hiding this comment

charsmith Aug 4, 2023

Choose a reason for hiding this comment

georgesittas Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

georgesittas Aug 4, 2023

Choose a reason for hiding this comment

georgesittas Aug 4, 2023

Choose a reason for hiding this comment

georgesittas Aug 4, 2023

Choose a reason for hiding this comment

georgesittas commented Aug 4, 2023

georgesittas commented Aug 4, 2023

georgesittas left a comment •

edited

Loading

georgesittas Aug 4, 2023 •

edited

Loading