Database is closing unexpectedly - PostgreSQL on Render

I’m hosting Django app with PostgreSQL, both hosted on Render.
On daily basis I’m receiving errors:

OperationalError
server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

I’m trying to patch things with GitHub - jdelic/django-dbconn-retry: Patches Django to reconnect on a failed database connection once before failing. Helping with running Django ORM through HAProxy, for example. hopefully it will help.
This is first time I encountered these errors. I have hosted multiple project on heroku with same setup and never encountered these errors.

How can I investigate what is happening to prevent these kind of errors in future?
Is render using some specific setup for managed databases?
Thanks.

Hello @mbojcic-dacommo ,

Thanks for reporting this error, I’m happy to look into this further.

When is the last time you received this error? Did adding django-dbconn-retry seem to fix the issue?

Hi @danielle,
sorry for later response.
Unfortunately it didn’t help. Here’s screenshot of all issues I see in my Sentry

@danielle I think I have found temporary solution.
In my health checks API, I’ve added simple DB query execution and errors have stopped happening.

It is throwing errors when using dbconn-retry, but not if connections are forced to be open by pinging every 5 seconds. It looks like it has trouble creating connections to DB after prolonged period of not connecting.

Why is this happening?

Hi @mbojcic-dacommo, do you have any issues connecting to your database from outside the context of your Django app? For example, if you use psql from the command line with your database’s external connection string, do you see similar issues? (You can find the external connection string in the Render dashboard page for your database.)

Psql or pgadmin are working fine.

Hi @mbojcic-dacommo,

If the psql command and pgadmin are working fine, then there is something in the django application that might be the culprit. Did you ever run into this documentation when you were setting up your application? Deploy Django on Render | Render. If you set up your application this way, can you confirm if the connection string is the internal connection string provided on the dashboard for your postgres database?

No it isn’t, configuration is ok, app is working fine on heroku. Only on render I’m getting random errors.

After getting “SSL connection has been closed unexpectedly” this is what came in logs.

Reconnecting to the database didn’t help could not connect to server: Connection refused
Is the server running on host “frankfurt-postgres.render.com” (18.156.150.184) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host “frankfurt-postgres.render.com” (3.126.175.201) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host “frankfurt-postgres.render.com” (3.126.218.175) and accepting
TCP/IP connectio…

We seem to have some what the same problem here. Every morning (3 days in a row) around 9AM CEST all Postgres connections originating from Render services timeout to our RDS hosted DB. External services don’t have this problem. So seems like there is another render network issue here.

@tyler issue occured again.
Server has been pinging DB for 4 weeks straight, no errors.

It happened again on Aug 26, 2021 12:11:52 AM UTC

Is the server running on host "frankfurt-postgres.render.com" (3.126.218.175) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Connection refused
	Is the server running on host "frankfurt-postgres.render.com" (18.156.150.184) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Connection refused
	Is the server running on host "frankfurt-postgres.render.com" (3.126.175.201) and accepting
	TCP/IP connections on port 5432?...

Hi @mbojcic-dacommo, I’m sorry this is happening again. Do you mind if I take a look at your environment variables to make sure we can rule out any connection string issues?

Can I send you DB settings and DB related env variables in email?

Yes please email those details to support@render.com and I will take a look!

We also see Postgres connection issues every once in a while, either “can’t react oregon” or “connection was closed”. We’re using Prisma, which is a younger PG library, so I assumed the issue was there, but it definitely fits the pattern others are describing here.

We have added middleware to retry which mostly has fixed it, but we still occasionally get issues when the retry fails.

@tyler Do you have any updates?

Hi @mbojcic-dacommo, my apologies for the late response. We were able to track down a correlation between your connection issues and a minor issue in our system with how we handle connections after deploying one of our proxies. The issue should be resolved.

1 Like

Hi Tyler, it occured again Sep 14, 2021 9:51:28 PM UTC

Hey @mbojcic-dacommo,

This lines up with when we rolled out a new version of the underlying infrastructure. When doing this, all active connections will be closed and new ones must be established by the clients. I am surprised the reconnect library isn’t working for you, since we bring up the new infrastructure before taking down the old infra. There should never be a point when your database is inaccessible. Have you verified that the reconnection works properly locally when the database is restarted?

It sounds like both your application and your database are running in the Frankfurt region, so you should be able to use the internal connection string instead of the external connection string. This will bypass this infrastructure altogether and you shouldn’t see this connection issue going forward. You will also see better performance when using that connection string.

Thanks for the update @jake
Retry library tries to reconnect just one time, maybe it takes longer to be reachable from our side.

Thanks for internal connection string tip, we have switched to using those.
Hopefully errors will be gone now.