Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(teradata): add support for the SAMPLE clause #2169

Merged
merged 2 commits into from
Sep 6, 2023
Merged

Conversation

georgesittas
Copy link
Collaborator

Motivated by #2161

@tobymao tobymao merged commit d9f8910 into main Sep 6, 2023
@tobymao tobymao deleted the jo/teradata_sample branch September 6, 2023 17:48
@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

Pardon my limited understanding here, but where is tablesample_sql from teradata.py called from? I did not reach that function in my copy of the code

@georgesittas
Copy link
Collaborator Author

@Ashay4u it's called from the base generator's sql method when calling transpile on a query that contains the SAMPLE clause. What do you mean you didn't reach that function?

@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

I added a breakpoint on def tablesample_sql and its return statement in terdata.py file and the code did not stop there during debug. It does not arrive at the breakpoint for tablesample_sql in generator.py during transpile too. Am i missing a change somewhere
Also the output that I see after running the code is as below. Notice there are no values in the sample clause

SELECT
  *
FROM Employee Sample ()

@georgesittas
Copy link
Collaborator Author

Can you share your example so I can try to reproduce this? What I've tried so far reaches that code.

@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

Code below

sql="""
SELECT * FROM 
	Employee 
	Sample 0.33, .25, .1;
"""

try:
    new_query = sqlglot.transpile(sql, read='teradata', write='databricks', pretty=True)[0]
    print(new_query)
except sqlglot.errors.ParseError as error:
    print(traceback.format_exc())
    print(error.errors)

My code does not stop at tablesample_sql in generator.py too during transpile. Am i missing a change somewhere

@georgesittas
Copy link
Collaborator Author

georgesittas commented Sep 7, 2023

Yeah, it doesn't reach that method because you're transpiling to databricks. That doesn't work currently, I only added support for parsing/generating in Teradata, because the transpilation is out of scope for now.

See related comment: #2161 (comment).

@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

Thanks for clarifying. Where should I make changes to the code to try and fix this? I will try to get this to some acceptable fix

@georgesittas
Copy link
Collaborator Author

georgesittas commented Sep 7, 2023

Ideally you need to parse this into a canonical representation, so that:

  1. You don't lose information when transpiling to Teradata, i.e. you need to preserve SAMPLE 0.33, 0.25, 0.1.
  2. You can generate valid SQL when targeting other dialects from that canonical representation.

From what I saw, in order to implement (2), you need to check what kind of numbers you have in the SAMPLE. In your example, 0.33 would be converted to 33 PERCENT, whereas SAMPLE 4 would probably be converted into 4 ROWS.

I see you mentioned ... TABLESAMPLE 33 PERCENT as the expected Databricks output. Why is that? Aren't you dropping 0.25 and 0.1 in that case, or is that an acceptable information loss?

@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

For my use case, its an acceptable loss. Is there an easy fix for it?

@georgesittas
Copy link
Collaborator Author

georgesittas commented Sep 7, 2023

The fix is not trivial, at least if you're not experienced with SQLGlot's codebase. One idea is to override _parse_table_sample in Teradata and:

  • Call super()._parse_table_sample
  • If that returns a TableSample, you can check the first arg of expressions and depending on what kind of number it is you can also set another arg like percent, rows etc.

See my edit in the above reply too.

@Ashay4u
Copy link

Ashay4u commented Sep 7, 2023

Thanks. Will give it a try. How do I familiarize myself with the codebase details. Is there documentation I can refer or just debug through the code to understand? Again, thanks a lot for being so prompt and helpful

@georgesittas
Copy link
Collaborator Author

georgesittas commented Sep 7, 2023

Sure thing, glad you want to contribute. You can also join our slack btw if you want to chat.

How do I familiarize myself with the codebase details. Is there documentation I can refer or just debug through the code to understand?

Yeah I'd suggest debugging through the code and looking into past issues & their fixes to understand the work flow. I think the above outline should cover your use case, so try to work around that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants