Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute .SQL files #39

Closed
maddy2u opened this issue Jun 19, 2021 · 9 comments
Closed

Execute .SQL files #39

maddy2u opened this issue Jun 19, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@maddy2u
Copy link

maddy2u commented Jun 19, 2021

I want to be able to execute full .SQL files by passing them through the driver. How can i do that?

@Brooke-white
Copy link
Contributor

Hi @maddy2u ,

The driver does not support that functionality at this time, so some pre-proccessing would need to occur in your script before passing the individual statements to the driver. Something like the code snippit below could work:

import redshift_connector
import os

MY_FILE: str = os.environ.get('SQL_SMTS', None)

def preprocess_sql_file(filepath: str) -> list[str]:
   """Returns a list of individual SQL statements"""
  raise NotImplementedError

with redshift_connector.connect(...) as conn:
   with conn.cursor() as cursor:
      for stmt in preprocess_sql_file(MY_FILE):
         cursor.execute(stmt)
   conn.commit() # optionally, commit changes

I will chat with the team to see if we can add support for executing .SQL files directly and update this issue with my findings.

@maddy2u
Copy link
Author

maddy2u commented Jun 25, 2021

Thanks. Any update on the above. Would be a good functionality to have inbuilt.

@Brooke-white
Copy link
Contributor

Hey @maddy2u ,

To support executing .SQL files directly we would need to re-work how the driver sends statements to the Amazon Redshift server. Currently, prepared statements are used in all cases-- which prevents us from supporting mult-statements such as:

cursor.execute("SELECT 1; SELECT 2; SELECT 3;")

this also prevents us from supporting .SQL file execution. We would need to move from using prepared statements to statements to support this.

I'm working to gauge how much re-work this change would require and get an idea about how this would impact the drivers performance.

@maddy2u
Copy link
Author

maddy2u commented Jun 26, 2021

Makes sense. Looking forward to your response @Brooke-white ! Thank you.

@Brooke-white
Copy link
Contributor

Hi @maddy2u ,

At this time we do not plan to add functionality to execute .SQL files using redshift_connector. Parsing SQL is a big undertaking -- enough so that an entire project, sqlparse, has been dedicated to it.

We are continuing to scope work for supporting multi-statement execution.

Here's an example of how sqlparse could be used for executing .SQL files with redshift_connector. Please see the documentation for sqlparse here for more details.

import redshift_connector
import sqlparse

with open('data.sql', 'r') as file:
    raw_sql_file = file.read()
    sql_stmts = sqlparse.split(raw_sql_file)

    with redshift_connector.connect(...) as conn:
        with conn.cursor() as cursor:
            for stmt in sql_stmts:
                cursor.execute(stmt)

@liuming
Copy link

liuming commented Sep 7, 2021

@Brooke-white thanks for advising a workaround. It'll be nice to support multiple statements in a single execute call. I notice psycopg allow me to do this, but I was not able to identify exactly which line from its source code provides such support.

@Brooke-white
Copy link
Contributor

HI @liuming ,

No problem -- yes I agree it'd be much cleaner (and simpler) to support this in a single call to execute.

I believe psycopg supports this due to the approach taken to executing the statement. There are two approaches a driver can take to execute a statement -- simple query cycle and extended query cycle. Simple query cycle supports executing multiple statements, while extended query protocol does not. redshift_connector uses extended query protocol, which is the root cause for our lack of support. If you're interested, the above links give an interesting overview of the two approaches and their trade offs.

When I have extra time I am working to add support for simple query cycle (i.e. multiple statement execution), but there's no timeline on when this would be available as we are prioritizing other tasks at this time. Ideally, I'd like to provide support for both extended and simple query cycles so users have the option to fine tune how statements are executed :)

@liuming
Copy link

liuming commented Sep 15, 2021

@Brooke-white I really appreciate you took the time to link these documentations and also provide a surprising detailed reply. I learn something new today. Thank you. 👏

@Brooke-white
Copy link
Contributor

closing this out as we won't be supporting direct execution of SQL files, or multi-statements in the short term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants