Understanding managed postgres

We are on the pro version of the managed postgres product. Occasionally (once every six weeks or so) we get cpu spikes that go to 100%, and basically everything in our app becomes bottlenecked as the DB is getting throttled. Most times I restart the DB and things eventually settle back to normal, but its obviously very disruptive and also requires manual intervention on my part.

I have observability, logging, and as much tooling I can think of in place to try and understand why this happens when it does, but with managed resources like this there is only so much info I can get. From everything I have looked at, nothing out of the ordinary seems to be happening during these events (site traffic looks the same, queries look to be usual, etc.).

Does anyone have any suggestions on how they would go about diving deeper into what is going on to understanding this issue? Are the postgres plans virtualized and on shared resources? Is it possible it could be noisy neighbors and nothing to do with my application itself? Has anyone kept their applications hosted on render but moved the DB away to a different provider (something like neon), and if so how were those experiences?

Any insight would be super helpful.

Hi Paul,

Have you taken a look at this doc: https://docs.render.com/postgresql-performance-troubleshooting (our datastores team has been making some really great additions to it recently)?

There are also apps like pgAdmin or pghero that can be useful for this kind of troubleshooting.

If you want us to investigate potential noisy neighbors, feel free to file a support ticket from the dashboard and include the database ID. However, considering that it happens on a regular basis, and restarting your service that was running on unhealthy infrastructure should schedule it onto new infrastructure (getting you new neighbors), it doesn’t sound like a noisy neighbor type of issue. But again, happy to take a look to more definitively rule that out.

Regards,

Matt

1 Like