-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(parser, duckdb): decode/encode in duckdb don't take charset #1993
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, left a few of comments.
I'm not sure if we should alter the base parser for this. I would personally just add a couple of entries in DuckDB's Parser.FUNCTIONS
(see how it's done in presto.py), or update the presto dialect so that it routes to _parse_decode/encode
in the base parser and reuse that in duckdb.
cc: @tobymao
self.unsupported(f"Unsupported charset {charset}") | ||
|
||
expression = expression.copy() | ||
del expression.args["charset"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this, happy to change this if there is a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do return self.func(name, expression.this, expression.args.get("replace") if replace else None)
(see other comment about replace
flag).
if "charset" in expression.args: | ||
charset = expression.args["charset"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "charset" in expression.args: | |
charset = expression.args["charset"] | |
charset = expression.args.get("charset") | |
if charset: |
FUNCTION_PARSERS = { | ||
**parser.Parser.FUNCTION_PARSERS, | ||
"ENCODE": lambda self: self._parse_encode_decode(exp.Encode), | ||
"DECODE": lambda self: self._parse_encode_decode(exp.Decode), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just do these inline I think, like:
"DECODE": lambda self: self.expression(
exp.Decode, this=self._parse_conjunction(), charset=exp.Literal.string("utf-8")
),
"ENCODE": lambda self: self.expression(
exp.Encode, this=self._parse_conjunction(), charset=exp.Literal.string("utf-8")
),
@@ -571,6 +571,20 @@ def datestrtodate_sql(self: Generator, expression: exp.DateStrToDate) -> str: | |||
return self.sql(exp.cast(expression.this, "date")) | |||
|
|||
|
|||
# Used for Presto and Duckdb which use functions that don't support charset, and assume utf-8 | |||
def encode_decode_sql(self: Generator, expression: exp.Expression, name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need a boolean flag here to control whether or not replace
will be generated.
Looks good, just a few final comments. |
@charsmith I'll wrap this up, should be just a couple of changes. Thanks! |
Duckdb doesn't take a param for charset on decode/encode.