-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: add apache doris dialect #2006
Changes from all commits
908e512
c2e1c3c
f8610af
1c195ff
422b805
6cd2b35
f1caad0
ea1aba2
7c5dbcd
fcd6550
660b223
85b350a
deed875
b43383f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
from __future__ import annotations | ||
|
||
import typing as t | ||
|
||
from sqlglot import exp, generator | ||
from sqlglot.dialects.dialect import ( | ||
approx_count_distinct_sql, | ||
arrow_json_extract_sql, | ||
rename_func, | ||
) | ||
from sqlglot.dialects.mysql import MySQL | ||
from sqlglot.helper import seq_get | ||
|
||
|
||
def _to_date_sql(self: MySQL.Generator, expression: exp.TsOrDsToDate) -> str: | ||
this = self.sql(expression, "this") | ||
self.format_time(expression) | ||
return f"TO_DATE({this})" | ||
|
||
|
||
def _time_format( | ||
self: generator.Generator, expression: exp.UnixToStr | exp.StrToUnix | ||
) -> t.Optional[str]: | ||
time_format = self.format_time(expression) | ||
if time_format == Doris.TIME_FORMAT: | ||
return None | ||
return time_format | ||
Comment on lines
+21
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should factor this out into a helper in |
||
|
||
|
||
class Doris(MySQL): | ||
georgesittas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DATE_FORMAT = "'yyyy-MM-dd'" | ||
DATEINT_FORMAT = "'yyyyMMdd'" | ||
TIME_FORMAT = "'yyyy-MM-dd HH:mm:ss'" | ||
|
||
TIME_MAPPING = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several entries here are also in MySQL's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, mysql and doris have some similar mappings, but we found that some mappings are not supported when going from hive to doris, so we added some on the previous basis to adapt to the syntax conversion of hive to doris There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how similar is it to starrocks? we support that as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The degree of compatibility with starrocks is similar to that of mysql. starrcoks came from fork doris and went out alone, but with the development and improvement of the community later, it will become more and more different from starrocks, and I plan to spend more time to perfect it Convert different data sources into doris.thanks |
||
"%M": "%B", | ||
"%m": "%%-M", | ||
"%c": "%-m", | ||
"%e": "%-d", | ||
"%h": "%I", | ||
"%S": "%S", | ||
"%u": "%W", | ||
"%k": "%-H", | ||
"%l": "%-I", | ||
"%W": "%a", | ||
"%Y": "%Y", | ||
"%d": "%%-d", | ||
"%H": "%%-H", | ||
"%s": "%%-S", | ||
"%D": "%%-j", | ||
"%a": "%%p", | ||
"%y": "%%Y", | ||
"%": "%%", | ||
} | ||
|
||
class Parser(MySQL.Parser): | ||
FUNCTIONS = { | ||
**MySQL.Parser.FUNCTIONS, | ||
"DATE_TRUNC": lambda args: exp.TimestampTrunc( | ||
this=seq_get(args, 1), unit=seq_get(args, 0) | ||
), | ||
Comment on lines
+59
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do this exact thing in both Postgres and Starrocks already. Let's dry it out into a helper in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's dry it out into a helper in dialect.py. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes dry means don’t repeat yourself There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please clean up any instance of copy and paste |
||
"REGEXP": exp.RegexpLike.from_arg_list, | ||
} | ||
|
||
class Generator(MySQL.Generator): | ||
CAST_MAPPING = {} | ||
|
||
TYPE_MAPPING = { | ||
**MySQL.Generator.TYPE_MAPPING, | ||
exp.DataType.Type.TEXT: "STRING", | ||
exp.DataType.Type.TIMESTAMP: "DATETIME", | ||
exp.DataType.Type.TIMESTAMPTZ: "DATETIME", | ||
} | ||
|
||
TRANSFORMS = { | ||
georgesittas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
**MySQL.Generator.TRANSFORMS, | ||
exp.ApproxDistinct: approx_count_distinct_sql, | ||
exp.ArrayAgg: rename_func("COLLECT_LIST"), | ||
exp.Coalesce: rename_func("NVL"), | ||
exp.CurrentTimestamp: lambda *_: "NOW()", | ||
exp.DateTrunc: lambda self, e: self.func( | ||
"DATE_TRUNC", e.this, "'" + e.text("unit") + "'" | ||
), | ||
exp.JSONExtractScalar: arrow_json_extract_sql, | ||
exp.JSONExtract: arrow_json_extract_sql, | ||
exp.RegexpLike: rename_func("REGEXP"), | ||
georgesittas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
exp.RegexpSplit: rename_func("SPLIT_BY_STRING"), | ||
exp.SetAgg: rename_func("COLLECT_SET"), | ||
exp.StrToUnix: lambda self, e: f"UNIX_TIMESTAMP({self.sql(e, 'this')}, {self.format_time(e)})", | ||
exp.Split: rename_func("SPLIT_BY_STRING"), | ||
exp.TimeStrToDate: rename_func("TO_DATE"), | ||
exp.ToChar: lambda self, e: f"DATE_FORMAT({self.sql(e, 'this')}, {self.format_time(e)})", | ||
exp.TsOrDsAdd: lambda self, e: f"DATE_ADD({self.sql(e, 'this')}, {self.sql(e, 'expression')})", # Only for day level | ||
exp.TsOrDsToDate: lambda self, e: self.func("TO_DATE", e.this), | ||
exp.TimeStrToUnix: rename_func("UNIX_TIMESTAMP"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be removed, already exists in MySQL. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok,thanks |
||
exp.TimeToUnix: rename_func("UNIX_TIMESTAMP"), | ||
exp.TimestampTrunc: lambda self, e: self.func( | ||
"DATE_TRUNC", e.this, "'" + e.text("unit") + "'" | ||
), | ||
exp.UnixToStr: lambda self, e: self.func( | ||
"FROM_UNIXTIME", e.this, _time_format(self, e) | ||
), | ||
exp.UnixToTime: rename_func("FROM_UNIXTIME"), | ||
exp.Map: rename_func("ARRAY_MAP"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, let's get rid of it.