v2.44.0.cli-migrations-v3: No retry for postgres database connection, container frozen #10595

raweber42 · 2024-11-13T13:22:08Z

Version Information

Server Version:
CLI Version (for CLI related issue): v2.44.0.cli-migrations-v3

Environment

self-hosted

What is the current behaviour?

We are running hasura in our kubernetes cluster. We have a postgres DB (it's ephemeral) in a container deployed next to it. On startup, hasura is faster than the postgres DB. So I naturally see the log entry

{"error":"connection error","path":"$","code":"postgres-error","internal":"connection to server at \"localhost\" (127.0.0.1), port 5433 failed: Connection refused\n\tIs the server running on that host and accepting TCP/IP connections?\nconnection to server at \"localhost\" (::1), port 5433 failed: Cannot assign requested address\n\tIs the server running on that host and accepting TCP/IP connections?\n"}

The problem that I have is, that hasura does not retry the postgres connection. There is no additional logging until the hasura gets killed by HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT. Once a new container is spun up by kubernetes, the postgres db is ready and everything works fine.

I don't remember that we had a similar issue before. So it might be a regression issue, because we've been using the same setup for several months now.

The central question here is: Is there a retry mechanism for the database connection of the temporary server that's being created by the cli-migrations-v3 image?. From what I can see, there is not. Even when running the (tests)[https://github.com/hasura/graphql-engine/tree/master/packaging/cli-migrations/v3/test] in the hasura repo, I can see the same behavior if the postgres DB is not already available when hasura starts up.

I am willing to contribute to the project to fix this, if necessary!

What is the expected behaviour?

The container retries the DB connection in a (optional: configurable) interval.

How to reproduce the issue?

Use the docker-compose file from the (test folder)[https://github.com/hasura/graphql-engine/blob/master/packaging/cli-migrations/v3/test/docker-compose.yaml].
Run docker-compose up
Check the logs of the hasura container and see that there is no retry for the database connection
See that the container gets killed once HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT has been reached.

Screenshots or Screencast

Please provide any traces or logs that could help here.

In the (test)[https://github.com/hasura/graphql-engine/blob/master/packaging/cli-migrations/v3/test/test.sh] section of the image I can see that the postgres DB is spun up before the hasura instance. Maybe it's a coincident, but this might approve my suspicion that there is no repeated check of the DB connection in the hasura instance.

Any possible solutions/workarounds you're aware of?

Implement polling/retrying for the database connection.

Keywords

auto-migrate, cli-migrations-v3, database, postgres, metadata

The text was updated successfully, but these errors were encountered:

raweber42 added the k/bug Something isn't working label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.44.0.cli-migrations-v3: No retry for postgres database connection, container frozen #10595

v2.44.0.cli-migrations-v3: No retry for postgres database connection, container frozen #10595

raweber42 commented Nov 13, 2024

v2.44.0.cli-migrations-v3: No retry for postgres database connection, container frozen #10595

v2.44.0.cli-migrations-v3: No retry for postgres database connection, container frozen #10595

Comments

raweber42 commented Nov 13, 2024

Version Information

Environment

What is the current behaviour?

What is the expected behaviour?

How to reproduce the issue?

Screenshots or Screencast

Please provide any traces or logs that could help here.

Any possible solutions/workarounds you're aware of?

Keywords