Intermittent Weird Connection Error on shared postgres

Hi All,

I have a really weird behaviour with postgres connection which fail intermittently without reason (shared postgres). I’m trying to do an import of a small set of data (around 50 rows) through an Hasura (2.0.0.beta.2).
I have a small private network on render.com with this hasura and an expressjs for internal logic, each running as docker container. Only the hasura is connected to the DB, and by default it support 50 concurrent connections, and have a wait queue in case of overload. Render.com says that the limit for postgres instances (whatever the plan) is 97 concurrent connections… which is almost twice what hasura could use, so we should be good. And of course, it was working yesterday.

So when I say intermittent, it’s because the import is failing roughly in the middle, each time. So it works a little bit then fails.

So here are logs (from hasura) :

{“type”:“http-log”,“timestamp”:"…",“level”:“error”,“detail”:{“operation”:{“user_vars”:{…},“error”:{“internal”:"no connection to the server\n",“path”:"$",“error”:“connection error”,“code”:“postgres-error”}
{“type”:“pg-client”,“timestamp”:"…",“level”:“warn”,“detail”:{“message”:“postgres connection failed, retrying(1).”}}
{“type”:“scheduled-trigger”,“timestamp”:"…",“level”:“error”,“detail”:{“internal”:“FATAL: the database system is in recovery mode\nFATAL: the database system is in recovery mode\n”,“path”:"$",“error”:“connection error”,“code”:“postgres-error”}}
{“type”:“http-log”,“timestamp”:"…",“level”:“error”,“detail”:{“operation”:{“user_vars”:{…},“error”:{“internal”:“no connection to the server\n”,“path”:"$",“error”:“connection error”,“code”:“postgres-error”},“request_id”:"…",“response_size”:106,“query”:…}
{“type”:“pg-client”,“timestamp”:"…",“level”:“warn”,“detail”:{“message”:“postgres connection failed, retrying(1).”}}

So it really looks like it’s coming from Postgres instance itself… especially : “FATAL: the database system is in recovery mode” which is clearly coming from PG.

Any clue ? :slight_smile:

Thanks a lot.
(by the way, I’m not sur it’s here I need to post this but I do not find anywhere else to post)

Hello Gilles,

I investigated your database and there seems to be a period of time that it went into recovery mode for about 28 hours and then everything seemed to go back to normal around 2021-06-17 18:18:45 UTC.

We are continuing to investigate and will get back to you shortly.

Can you check if your app is now working as expected?

Hello Sean, yes everything is working better now. Thanks.
Btw, I changed the way I do my import and treat them as batch, which reduce drastically the number of connection, which is much better. So I do not know if the problem still there.

Whatever, I also still don’t know if the problem comes from hasura itself, because normally, hasura should manage its pool of db connection, which could contain 50 connections (in regard of the 97 allowed by render). Whatever again, it’s weird that too much connection (which should not be the case but…) places the DB in recovery mode.

If you have any news, please forward them :wink:

Thanks again,

Gilles