You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe that the async_connect code violates the libpq documentation by assuming that the underlying socket file-descriptor will not change over the course of the connection sequence. The relevant section of the documentation is below:
If PQconnectStart or PQconnectStartParams succeeds, the next stage is to poll libpq so that it can proceed with the connection sequence. Use PQsocket(conn) to obtain the descriptor of the socket underlying the database connection. (Caution: do not assume that the socket remains the same across PQconnectPoll calls.) Loop thus: If PQconnectPoll(conn) last returned PGRES_POLLING_READING, wait until the socket is ready to read (as indicated by select(), poll(), or similar system function). Then call PQconnectPoll(conn) again. Conversely, if PQconnectPoll(conn) last returned PGRES_POLLING_WRITING, wait until the socket is ready to write, then call PQconnectPoll(conn) again. On the first iteration, i.e., if you have yet to call PQconnectPoll, behave as if it last returned PGRES_POLLING_WRITING. Continue this loop until PQconnectPoll(conn) returns PGRES_POLLING_FAILED, indicating the connection procedure has failed, or PGRES_POLLING_OK, indicating the connection has been successfully made.
In my environment (running local postgres 12 server in a docker container), the connection always times out. After debugging, I found that the below code fixes the issue, although it is probably a huge hack. I am not familiar enough with the ozo codebase to feel confident in this fix:
Unrelated, but it is also worth noting that the null_buffers method of waiting for the socket to become ready is deprecated. The preferred method is to use socket_.async_wait
The text was updated successfully, but these errors were encountered:
Hi!
Thanks for reaching us. Well, yes, it definitely violates documentation in cases of multi-host connection string. Do you use a multi-host in your connection string, or have the issue with a single host in the connection string?
This is a nasty bug. As far as I understand, PQconnectPoll() may call closesocket() on the underlying socket between calls. So PQconnectPoll() -> poll(PQsocket(conn)) is how it normally goes. ozo seems to be caching the result of PQsocket(conn) though and re-using the fd. Because libpq closes the fd, and fds may end up being reused, any application that creates other fds can run into this issue, along with any other issues that come with poll'ing on fds that don't belong to ozo. It manifests in such a way that it's somewhat hard to debug unfortunately.
I would think this could cause other problems but so far, I've only seen the PGRES_POLLING_OK event getting dropped and some PGRES_POLLING_WRITING and PGRES_POLLING_READING events getting caught. So I figure I don't understand the whole situation and would probably assume my understanding above is somewhat off. Still, probably not great.
I believe that the async_connect code violates the libpq documentation by assuming that the underlying socket file-descriptor will not change over the course of the connection sequence. The relevant section of the documentation is below:
https://www.postgresql.org/docs/current/libpq-connect.html
In my environment (running local postgres 12 server in a docker container), the connection always times out. After debugging, I found that the below code fixes the issue, although it is probably a huge hack. I am not familiar enough with the ozo codebase to feel confident in this fix:
Unrelated, but it is also worth noting that the null_buffers method of waiting for the socket to become ready is deprecated. The preferred method is to use socket_.async_wait
The text was updated successfully, but these errors were encountered: