Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postrgresql_query cannot handle .sql file with \n at end of file #180

Closed
jtelcontar opened this issue Jan 14, 2022 · 22 comments · Fixed by #192
Closed

postrgresql_query cannot handle .sql file with \n at end of file #180

jtelcontar opened this issue Jan 14, 2022 · 22 comments · Fixed by #192

Comments

@jtelcontar
Copy link

SUMMARY

We are using a playbook task to run an SQL file that is generated by a PHP script. The PHP script uses file_put_contents() to create the .sql file based upon an array of queries (could be any number of queries). All queries end with a semicolon. The file is generated with a \n at the end of the file every time, and there is no way to generate the file without it. Whenever I run the task, I receive a fatal error (Cannot execute SQL '' None: can't execute an empty query). The only way to solve this is to open up the .sql file in an editor, delete the \n character, and run the playbook again.

I tried everything I can think of to attempt to trim that \n during the file generation, but no luck. Hoping you all can help or confirm this is a feature not a bug.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

postgresql_query

ANSIBLE VERSION
ansible [core 2.11.7]
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/ahart/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
  ansible collection location = /home/ahart/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0]
  jinja version = 2.11.3
  libyaml = False
COLLECTION VERSION
# /usr/local/lib/python2.7/dist-packages/ansible_collections
Collection           Version
-------------------- -------
community.postgresql 1.6.0
CONFIGURATION
DEFAULT_FORKS(/etc/ansible/ansible.cfg) = 100
DEPRECATION_WARNINGS(/etc/ansible/ansible.cfg) = False
DISPLAY_SKIPPED_HOSTS(/etc/ansible/ansible.cfg) = False
OS / ENVIRONMENT

Debian 10
PHP 7.4
Python 2.7 ( cannot upgrade to Python 3 yet on this server)

STEPS TO REPRODUCE

Generate a text file containing a couple of postgres queries in PHP with file_put_contents(). Open the file in an editor and note that it contains a new line at the end of the file.

Run a playbook that contains the following simple task:

- name: Testing postgres query with path_to_script
  become: true
  become_user: postgres
  postgresql_query:
    db: "db_name"
    path_to_script: "path/to/generated/script"
EXPECTED RESULTS

Task finishes with no fatal error and queries from our file have been run successfully.

ACTUAL RESULTS

The task fails with a fatal error due to the query list having a blank query in it.

fatal: [hostname.goes.here]: FAILED! => {"changed": false, "msg": "Cannot execute SQL '' None: can't execute an empty query, query list: [\"UPDATE users SET mfa_enabled=true WHERE login='login_name_1'\", \"\\nUPDATE users SET mfa_enabled=true WHERE login='login_name_2'\", '']"}

(I've cleaned the resulting error message of our data.)

Thanks for any assistance you can offer!

@Andersson007
Copy link
Collaborator

@jtelcontar thanks for reporting this! Have you tried to use as_single_query: yes with the task?

@jtelcontar
Copy link
Author

@Andersson007 Thanks for the suggestion!

I just tested with as_single_query: yes and the task completed successfully.

When I was reading the Ansible documentation for the postgresql_query module (https://docs.ansible.com/ansible/latest/collections/community/postgresql/postgresql_query_module.html), it seemed like we should keep the value as no since we had multiple queries that were semicolon-separated. Was it a misunderstanding of the purpose of that parameter?

@Andersson007
Copy link
Collaborator

@jtelcontar thanks for the feedback! Your issue has reminded me that we planned to set yes as a default in the next major release, i.e. 2.0.0.
Anyway this is a bug. I'll try to take a look later.
This path_to_script option is a source of pain and the module is a good example of not great design...:) My fault among others:)

@Andersson007
Copy link
Collaborator

Andersson007 commented Jan 18, 2022

@jtelcontar could you please try to use the postgresql_db module with state restore and let me know if it works?
No need for now, thank you.

@jtelcontar
Copy link
Author

@Andersson007 Gotcha! Totally understandable. I think all of us have sources of pain in our projects. :) Thanks for replying. I'll definitely keep an eye on this issue. Let me know if you'd like me to test anything else.

@Andersson007
Copy link
Collaborator

@Andersson007 Gotcha! Totally understandable. I think all of us have sources of pain in our projects. :) Thanks for replying. I'll definitely keep an eye on this issue. Let me know if you'd like me to test anything else.

@jtelcontar hanks for offering help! Yes, it would be much appreciated.

@jtelcontar
Copy link
Author

@Andersson007 Checking in to see if there are any updates on your end. No worries if not; let me know when would be a good time to circle back around to review any updates. Many thanks!

@Andersson007
Copy link
Collaborator

@jtelcontar I've just put it in my todo for tomorrow morning. I'll investigate the situation and will propose something (probably several options) here to discuss among us (will ping our other contributors), thanks! :)

@jtelcontar
Copy link
Author

Awesome, appreciate the info! I'll keep an eye out. :)

@Andersson007
Copy link
Collaborator

cc @tcraxs @hunleyd @jtelcontar
The path_to_script option is a source of pain (also the as_single_query option has been introduced to mitigate things there). I can see the following options:

  1. To leave it as it is but set the default of as_single_query as True as was planned for the next major release. Here's the PR postgresql_query: set as_single_query default as True #185
  2. To deprecate the path_to_script and as_single_query arguments and remove them in the next major release (resp. I'll close the PR from the above sentence). We can suggest users using the postgresql_db module with state restore. What do you think? If pg_restore is able to load any scripts that path_to_script can, I think it should work or I missed something? Yeah, it would be a tough breaking change but anyway.

The current implementation looks ugly - it's just an extra thing. postgresql_db should be used to load dumps / scripts. The only difference is that users can't get return values BUT in a current state:

  • If they use as_single_query: yes (it's necessary in some cases because the module uses semicolons as query delimiter but semicolons can also be a part of functions), they will get only the result of the last statement.
  • If they use as_single_query: no, their scripts must not contain any statements containing semicolons. Semicolons can be used only as separators of the statements.

What's your opinion, folks? 1) or 2) ? I'm not in favor of any of them as I'm not a user and not an expert. If i had been able to implement the module again, i wouldn't have introduced path_to_script at all. But maybe we could leave it and set as_single_query: yes as its default?

@hunleyd
Copy link
Collaborator

hunleyd commented Feb 8, 2022

As a user, I'd never equate postgresql_db with 'run my SQL script', so I'm gonna -1 option 2. So I guess that means I'm +1 on option 1.

Is it worth maybe down the road introducing a new module postgresql_query_file and making postgresql_query only useful for executing a single query?

@DanScharon
Copy link

I'd personally prefer 1), as postgresql_db seems to be intended for the restore or initialization of a whole database, whereas postgresql_query could run a (set of) scripts on an existing database that maybe add (optional) indexes, tables depending on a set of conditions.

@jtelcontar
Copy link
Author

jtelcontar commented Feb 8, 2022

I agree that postgresql_db seems intended for restore/initialization of the whole database (and could "scare" people-- will their DB be completely replaced with the handful of queries they are trying to run, or leave the DB in a weird state?). Which is the main reason I didn't consider it an option for what I was trying to do with a single file that contained updates/inserts.

I do like the idea of separating out postgresql_query (single query only) and postgresql_query_file (handles a file with multiple queries).

My only concern with option 1: are there implications of using as_single_query: True if you want to get results for each individual query and not just the last one? Does someone with that use case have any options available to them? Or do they just need to separate out each of their queries individually and run postgresql_query for each one as a separate call?

@Andersson007
Copy link
Collaborator

Andersson007 commented Feb 9, 2022

Thank you folks!

@jtelcontar if you use as_single_query: True only the last query result will be returned. And this had been a default behavior in the past before query_all_results, etc. were introduced .. and this mechanism has turned out to be a pain:)

So we also have option 3 - a separate module called postgresql_query_file or maybe postgresql_script. We could move out all file related stuff (i.e. path_to_script, as_single_query, etc.) to that dedicated module. I like the idea but how about trade-offs, is it really worth introducing breaking changes?
We could

  1. create a separate module postgresql_query_file or postgresql_script
  2. add a deprecation warning to postgresql_query module when the file related options are used redirecting folks to the new module
  3. announce the deprecation and removing in the next major release
  4. remove the options in 6 months - 1 year
  5. it will allow us to move out the pain from one place ...into another place:)

So @hunleyd @DanScharon @jtelcontar @tcraxs questions are:

  1. How about the option 3
  2. Would you like to write the module? I could but it seems to be very simple, so don't miss the chance. Of course, we can help and we could put all guys involved as co-authors
  3. Which options we should remove from postgresql_query: imo it's obviously path_to_script and as_single_query and corresponding ret values (containing all).
  4. If you OK with adding the new module, please suggest
    a) possible module's name (feel free to suggest more options) and
    b) its interface (i.e. a set of arguments with values), it can be, for example
postgresql_script:
  path: /path/to/script # maybe src?
  (as_single_query: yes/no) # but maybe we should get rid of it and always run file's content as a single query and return only the last statement result?
...

@Andersson007
Copy link
Collaborator

in addition to the above, if someone reviews #185, i could release 2.0.0 tomorrow provided that there are no objections

@hunleyd
Copy link
Collaborator

hunleyd commented Feb 9, 2022

How about the option 3

+1

Which options we should remove from postgresql_query: imo it's obviously path_to_script and as_single_query and corresponding ret values (containing all).

Those two seem to be the only ones needed IMHO

If you OK with adding the new module, please suggest
a) possible module's name (feel free to suggest more options) and

i prefer postgresql_script as the name (since foo.sql is a 'sql script' in my mind).

b) its interface (i.e. a set of arguments with values), it can be, for example

We need path for sure. I'm not entirely keen on forcing as_single_query, but if we don't do that then we should probably have an option to cover ON_ERROR_STOP behavior. Perhaps I don't understand what's happening when you have a script file with multiple SQL statements and are using as_single_query.. does that run them all in the same transaction? If not, why wouldn't you just loop over postgresql_query?

@Andersson007
Copy link
Collaborator

@hunleyd

Perhaps I don't understand what's happening when you have a script file with multiple SQL statements and are using as_single_query..

When we have path_to_script specified:

  1. if as_single_query: yes, under the hood, we take the whole content of the file with Python's f.read() (we don't care about what's inside, a single query or multiple queries) and execute it all at once, i.e. we pass the whole content to the psycopg2 connector. The drawback is that, when there are multiple queries in the file, the connector returns the result of the last query, so we can't fill up query_all_results, etc. The module used to work only this way from the beginning but later there was a feature request to get results per query when reading from a files. It seems logical but i shouldn't blindly introduce it.. unfortunately it was a time of stagnation in the collection in terms of other contributors activity). This is the context. After that bugreports started appearing because the change was breaking (unexpectedly). That's why as_single_query was introduced. And we didn't set it to yes by default from the beginning not to break things again. Then we decided to change it properly via a major release (thank's for reviewing!)
  2. when as_single_query: no, the script content will be split by semicolons and each element will be put in a list and later the elements will be passed to the connector in loop as queries. It allows to get and return a result of each query (i.e. to fill up query_all_results, etc. Feels cool but the problem, which we missed when we were introducing the feature, is that semicolons can be an inner part of functions / strings, etc, and, at least for me, it seems tricky how to solve this. I think it's possible but it would make the logic very complicated which would probably bring more bugs than value...

I'm not sure whether we should have this logic in postgresql_script. So in the new module we can:

  1. Keep things simple, i.e. always execute file content as a whole (no args such as_single_query and no return values per query such as query_all_results. Pros: Simple inner logic. Cons: Absence of results per query. Users will be angry:)
  2. Move everything related to scripts from postgresql_query to postgresql_script as it is (including the current logic described above, as_single_query and query_all_results, etc. Pros: Presents of results per query. Users will be less angry:) Cons: Complicated logic.

My painful experience is telling me that it's important to make right decisions now, on a design stage.

Ideas?

@DanScharon
Copy link

DanScharon commented Feb 10, 2022

I'm not sure whether we should have this logic in postgresql_script. So in the new module we can:

  1. Keep things simple, i.e. always execute file content as a whole (no args such as_single_query and no return values per query such as query_all_results. Pros: Simple inner logic. Cons: Absence of results per query. Users will be angry:)

  2. Move everything related to scripts from postgresql_query to postgresql_script as it is (including the current logic described above, as_single_query and query_all_results, etc. Pros: Presents of results per query. Users will be less angry:) Cons: Complicated logic.

If there's a new module postgresql_script I'd personally prefer 1, because if I want results of individual queries I could still use postgresql_query and pass those queries instead of a (script) file. For me the whole point of a module postgresql_script is to run each script it is fed as a whole and optionally loop over a list of script files.

@hunleyd
Copy link
Collaborator

hunleyd commented Feb 10, 2022

thx for the detailed write-up @Andersson007 , much appreciated. i now agree with @DanScharon 's comment above

@Andersson007
Copy link
Collaborator

@hunleyd your welcome!

I've just created a draft PR #188. Details and the plan in the description.

@Andersson007
Copy link
Collaborator

Could anyone please review the bugfix for this particular issue #192 ?

@domainfun
Copy link
Contributor

@jtelcontar: Just to clarify, ANY text file SHOULD have a line break at the end of the file, i.e. the last line should be terminated properly (otherwise it's an "incomplete line"). It goes without saying that even a file with just one line should end with a line break. This is actually defined in the POSIX standards, see the POSIX definition of a Line - there is even a definition of an Incomplete Line.

I have seen lots of software IDEs and editors which create text files without ensuring that the last line is terminated properly. Any good IDE or editor will have a setting to "ensure every saved file ends with a line break".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants