Mysterious timeouts and Postgres errors on a Rails app

Hi,

A couple of months ago I switched https://amiibo.life from Heroku to Render.

Since then, I’ve noticed occasional errors about Postgres connections being unexpectedly terminated and, lately, seemingly random timeouts and oddly long loading times (sometimes 15s+ for the homepage). The Render dashboard just reported that my web container is unhealthy because of the occasional timeouts.

I suspect the Postgres errors might be because of misconfigured connection pools, but I haven’t found a combination of settings that resolves it. If I’m hosting both the web app and the database on Render, what settings does Render recommend for getting Rails and Postgres to play nice? I suspect Heroku did something automatically here that Render does not.

Beyond that, what can I do to troubleshoot this weird behavior between my Render web and database containers?

Thanks!

Hi John,

Thanks for reaching out (and nice site, I’m a bit of an Amiibo collector myself).

Would you be able to share some examples of the issue you are experiencing? That may help us better troubleshoot it with you, e.g. any logs/errors/output with timestamps (and timezone) and any specific URLs (if there’s a pattern), etc.

If you don’t want to share these details on the community forum, please feel free to raise a ticket with support@render.com

Kind regards

Alan

Hi Alan,

Sure, happy to share. I’ll do so here so others with a similar problem searching can find it.

The two most common errors are:

ActiveRecord::StatementInvalid
GET /
PG::ConnectionBad: PQconsumeInput() connection to server at “dpg-ca37l0f9re0psiboj0lg-a” (10.205.97.194), port 5432 failed: FATAL: the database system is in recovery mode

SSL connection has been closed unexpectedly
: SET client_min_messages TO ‘warning’
Aug 8th, 2022, 22:56:10 UTC

STACKTRACE

gems/activerecord-4.2.11.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:155:in async_exec': PG::ConnectionBad: PQconsumeInput() connection to server at "dpg-ca37l0f9re0psiboj0lg-a" (10.205.97.194), port 5432 failed: FATAL: the database system is in recovery mode SSL connection has been closed unexpectedly : SET client_min_messages TO 'warning' (ActiveRecord::StatementInvalid) from gems/activerecord-4.2.11.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:155:in block in execute’
[snip]
from gems/puma-3.12.6/lib/puma/thread_pool.rb:135:in `block in spawn_thread’

Recent instances:

  • Aug 8th, 2022, 22:56:10 UTC
  • Aug 8th, 2022, 06:41:30 UTC
  • Aug 7th, 2022, 17:09:20 UTC

…and…

ActionView::Template::Error
games#show
PG::UnableToSend: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Aug 4th, 2022, 15:52:36 UTC

STACKTRACE

gems/activerecord-4.2.11.1/lib/active_record/connection_adapters/postgresql_adapter.rb:602:in exec_prepared': PG::UnableToSend: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. : [snip SQL query] (ActionView::Template::Error) from gems/activerecord-4.2.11.1/lib/active_record/connection_adapters/postgresql_adapter.rb:602:in block in exec_cache’
[snip]
from gems/puma-3.12.6/lib/puma/thread_pool.rb:135:in `block in spawn_thread’

Recent instances:

  • Aug 4th, 2022, 15:52:36 UTC
  • Aug 4th, 2022, 15:52:35 UTC
  • Aug 3rd, 2022, 07:20:04 UTC

I can also email some full backtraces if that’ll help.

@al_ps My previous message was stuck in moderation for a bit and eventually appeared, so you may have missed it. Thanks in advance!

Hi @al_ps, looks like the mysterious timeouts have returned this morning, and I haven’t changed anything with the site’s code or config in days. My uptime reports are saying it’s going up and down every few minutes due to connection timeouts, and I’m seeing the same myself when I visit the site.

I don’t know if the timeouts are related to the Postgres errors I posted above or something separate.

Hi John,

Apologies for the delay getting back to you.

I can see the Postgres instance consistently fairly close to the 256MB RAM available on a starter DB, maybe that could be causing some instability?

However, I’ll ask the team to see if they can see anything else going on here.

Thanks for your patience.

Alan

Hi John,

Thanks again for your patience.

I’ve had feedback from the engineers, they’re not currently finding anything on our side that correlates to the instability you’re experiencing. They also think the issues may be related to connections bumping up against the memory limit of the Starter Postgres plan.

Maybe the Standard Postgres plan would be worth a try to see if that helps. However, please be aware that DB plan downgrades are not yet possible. If you wanted to try Standard and downgrade back to Starter later, you would need to create a new instance and restore a backup.

Please let us know if we can assist any further.

Alan

Thanks, Alan. Besides getting close to the Postgres RAM limit, looks like the web instance was also hitting its RAM limit of 512 MB. I suspect this was causing web processes or threads to be killed/restarted/whatever, in turn causing the database connections weirdness.

I think I’ve solved this by switching the WEB_CONCURRENCY environment variable from the suggested default of 4 to 0, which I got from here and here. This has brought the web instance down below 512 MB of RAM usage consistently. I tried 2 first but even that was too much. 0 apparently puts Puma into “single mode”, which uses even less RAM (see the first link for more info). I expect this should be sufficient to handle this app’s low-ish amount of traffic. (I have RAILS_MAX_THREADS at 2, though I’m not sure how much difference that makes here.)

So I think that’s things solved for me. Thanks, @al_ps! Hopefully this can help others with the same or similar issues.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.