ECONNRESET with Node.js and PostgreSQL

Hi there,

I’m using Postgres.js with Node.js and a PostgreSQL service:

// database.js
const sql = postgres();

export async function getCourses() {
  const courses = await sql`
    SELECT * FROM courses
  `;
  return courses;
}

This is working for the most part, but after a while the connection fails somehow with a mysterious ECONNRESET error, and then all database queries fail for a few minutes (maybe until Postgres.js reconnects?). The errors unfortunately do not shed much light about the cause of the problem:

Apr 27 10:22:53 PM  node:internal/process/promises:246
Apr 27 10:22:53 PM            triggerUncaughtException(err, true /* fromPromise */);
Apr 27 10:22:53 PM            ^
Apr 27 10:22:53 PM  
Apr 27 10:22:53 PM  Error: read ECONNRESET    at TCP.onStreamRead (node:internal/stream_base_commons:211:20) {
Apr 27 10:22:53 PM    errno: -104,
Apr 27 10:22:53 PM    code: 'ECONNRESET',
Apr 27 10:22:53 PM    syscall: 'read'
Apr 27 10:22:53 PM  }
Apr 27 10:22:54 PM  error Command failed with exit code 1.

In the issue on the Postgres.js repo, they discuss changing the connection timeout from 30 seconds (default) to 10 seconds.

Before I do this, I’d like to hear from the Render support team to know if there are any such timeouts that exist with connections, if I can do any further configuration here or if there are some general recommendations. I will also continue debugging more from my side to make sure I’m not doing something wrong.

I never had any issues with Postgres.js with Heroku’s PostgreSQL service.

Edit: Opened a new issue on the Postgres.js repo: ECONNRESET after a while, reconnecting takes a few minutes :( · Issue #179 · porsager/postgres · GitHub

Thanks!

So I may have made some progress here.

It seems like on Render, idle (persistent) PostgreSQL connections are being automatically closed (terminated) after some time.

This behavior is a problem for the Postgres.js client library, because it keeps connections open by default:

Connections will by default not close until .end() is called

So reading further, there is another option called idle_timeout:

but often it is useful to have them close when there is no activity or if using Postgres.js in eg. Lamdas / Serverless environments. This can be done using the idle_timeout option to specify the amount of seconds to wait before automatically closing an idle connection.

When I changed the setting idle_timeout option in Postgres.js to a low amount of seconds, then the ECONNRESET errors stopped (at least, they have not shown up for a few hours now).

So the question remains for the Render support team: why can this terminating behavior be observed? Is there a way to configure Render PostgreSQL like Heroku so that idle connections can stay open?

cc @dan @shantanu @jake

1 Like

Hi Karl,

Thanks for sharing your findings. I’d love to dig into this more and will reach out over DM to collect more info.

I believe that the server will close the connection when the server hasn’t seen any activity for apprx 2 hours and 11 minutes. That number comes from a combination of the following kernel settings:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200 

If there’s no activity on a TCP connection for 2 hours then the server will send 9 keep alive probes every 75 seconds. If the connection peer(client) doesn’t reply to any of the probes the server will terminate the connection. It seems that replying to the empty keep alive ACK is optional in the spec.

To test this out, I issued requests to the db 3 hours apart and noticed a bunch of ECONNRESET in my logs.

 May 07 19:49:20Z test-conn-reset worker-95lpp Error Error: read ECONNRESET    at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {
May 07 19:49:20Z test-conn-reset worker-95lpp   errno: -104,
May 07 19:49:20Z test-conn-reset worker-95lpp }
May 07 19:49:20Z test-conn-reset worker-95lpp Error Error: read ECONNRESET    at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {
May 07 19:49:20Z test-conn-reset worker-95lpp   code: 'ECONNRESET',
May 07 19:49:20Z test-conn-reset worker-95lpp   errno: -104,
May 07 19:49:20Z test-conn-reset worker-95lpp   code: 'ECONNRESET',
May 07 19:49:20Z test-conn-reset worker-95lpp }
May 07 19:49:20Z test-conn-reset worker-95lpp Error Error: read ECONNRESET    at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {
May 07 19:49:20Z test-conn-reset worker-95lpp   syscall: 'read'
May 07 19:49:20Z test-conn-reset worker-95lpp   errno: -104,
May 07 19:49:20Z test-conn-reset worker-95lpp   code: 'ECONNRESET',
May 07 19:49:20Z test-conn-reset worker-95lpp   errno: -104,
May 07 19:49:20Z test-conn-reset worker-95lpp   syscall: 'read'
May 07 19:49:20Z test-conn-reset worker-95lpp   syscall: 'read'
May 07 19:49:20Z test-conn-reset worker-95lpp Error Error: read ECONNRESET    at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {

I think the best way to handle this scenario is to do what you are already doing: which is to set an idle timeout. Setting the idle timeout to something less than 2hrs and 11 mins should help you avoid this issue.

Thanks for all the helpful info that made reproducing the issue easier !

2 Likes

Deployed a commit with idle_timeout: 7200 earlier today - so far, looking good.

I’ll update tomorrow or Friday for the final answer if this is an effective solution (and mark your post as Solution).

2 Likes

Ok, just failed again with the idle_timeout: 7200 - trying a new, lower value (idle_timeout: 6000)

idle_timeout: 6000 is also intermittently failing, will try 3000

Just had another issue with 3000, going to reduce this to a low number of seconds like 100.

This is unfortunate that we cannot keep a persistent connection to the database. @shantanu is there anything that you could tweak instead, to allow these persistent connections without any idle_timeout configuration at all?

As a side note, never had any issues with this on Heroku’s (free!) PostgreSQL service. No need for any idle_timeout configuration.

@shantanu please let me know if there’s anything else that can be done (eg. some configuration?) from Render’s side.

Would be nice to have parity with Heroku free databases.

I wanted to summarize a few of my observations based on the code in the following two repositories:

I created a database in Render’s Europe region and deployed both the above repos as background workers.

For the postgresjs-conn-test repo, I was able to consistently trigger the ECONNRESET log by setting the idleTimeout to 1 hour and querying the db every 50 minutes. I observed that I got the ECONNRESET on every other run. i:e once every 100 minutes.

I have yet to see a similar error in the conn-reset-go service. Both apps are pointing at the exact same database and make the same number of requests in the same time intervals.

I also tested the same setup in Render’s US region and I haven’t been able to replicate the issue there across both Node and Go apps.

Putting everything together, it seems to me that the most robust solution seems to be to handle this on the client side. I notice that Go’s standard library does account for such behavior here:

// QueryContext executes a query that returns rows, typically a SELECT.
// The args are for any placeholder parameters in the query.
func (db *DB) QueryContext(ctx context.Context, query string, args ...interface{}) (*Rows, error) {
	var rows *Rows
	var err error
	for i := 0; i < maxBadConnRetries; i++ {
		rows, err = db.query(ctx, query, args, cachedOrNewConn)
		if err != driver.ErrBadConn {
			break
		}
	}
	if err == driver.ErrBadConn {
		return db.query(ctx, query, args, alwaysNewConn)
	}
	return rows, err
}
1 Like