Instance is crippled near 30% AVG CPU usage

Hi,
I’ve already talked to support about this, but the issue probably fell through the cracks, so I am mentioning this here again. This way we might also get some insights from other users with a similar issue.

So we are seeing a strange behavior in which the server is crippled when Render’s CPU usage graph gets near 30% AVG (across all instances):

What does it mean “crippled”?

It means that the latencies of requests going out of the instance to external services (in this case Mongo Atlas and Heroku Postgres) are jumping though the roof:

We log this durations to external logging service, and we can see them spiking when Render is getting near 30%. For example this spike happened today around 8:13 PM (2023-01-30T18:13:00.000Z)

I wouldn’t bother you with the monitoring graph from Mongo Atlas, but we triple-checked that everything was calm and smooth. CPU, memory, and queries per second were as usual. Same for Heroku Postgres.

We’ve also looked at the AVG memory utilization graph in Render at that moment and it was also very low, at around 12%.

So we are trying to understand what other parameters might be choking in the Render instances. Maybe some other IO bandwidth of the instance that is reaching its limit? Which in that case, it’s very confusing, since looking at the AVG CPU graph, we are not even close to using the full potential of the instances.

It would be great if the Render team could help us here to pinpoint the problem. Specifically looking today at 2023-01-30T18:13:00.000Z at the instances from your side. Maybe you can see some resource/parameter of the instance being maxed out that we are missing here at the public Render dashboard.

Any help would be much appreciated as this is critical for us since we are afraid we are overspending a significant amount of budget on instances with unutilized CPU.

I’d also be happy to reproduce the issue per your request if it will help you to find the root cause.

Thank you.

1 Like

Hi Yaron,

Thanks for reaching out here. Since this is related to a specific service and an existing support request, we will follow up on the existing thread. Thanks for providing the additional example as well. We will look into this and follow up with any further findings.

We’ve figured this out! :slightly_smiling_face:
So the problem was, that we were using the Pro Plus instance which have 4 CPUs:

But our Node express app was using only 1 CPU. All Node apps are like this by default, and if you want Node to take advantage of all 4 cores, you some extra configuration.

But we took the easier path. We’ve changed the instance size to Standard which has only 1 CPU:

And we raised the Target CPU to 70%.

Now everything works beautifully:

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.