Hi,
I’ve already talked to support about this, but the issue probably fell through the cracks, so I am mentioning this here again. This way we might also get some insights from other users with a similar issue.
So we are seeing a strange behavior in which the server is crippled when Render’s CPU usage graph gets near 30% AVG (across all instances):
What does it mean “crippled”?
It means that the latencies of requests going out of the instance to external services (in this case Mongo Atlas and Heroku Postgres) are jumping though the roof:
We log this durations to external logging service, and we can see them spiking when Render is getting near 30%. For example this spike happened today around 8:13 PM (2023-01-30T18:13:00.000Z)
I wouldn’t bother you with the monitoring graph from Mongo Atlas, but we triple-checked that everything was calm and smooth. CPU, memory, and queries per second were as usual. Same for Heroku Postgres.
We’ve also looked at the AVG memory utilization graph in Render at that moment and it was also very low, at around 12%.
So we are trying to understand what other parameters might be choking in the Render instances. Maybe some other IO bandwidth of the instance that is reaching its limit? Which in that case, it’s very confusing, since looking at the AVG CPU graph, we are not even close to using the full potential of the instances.
It would be great if the Render team could help us here to pinpoint the problem. Specifically looking today at 2023-01-30T18:13:00.000Z at the instances from your side. Maybe you can see some resource/parameter of the instance being maxed out that we are missing here at the public Render dashboard.
Any help would be much appreciated as this is critical for us since we are afraid we are overspending a significant amount of budget on instances with unutilized CPU.
I’d also be happy to reproduce the issue per your request if it will help you to find the root cause.
Thank you.