I have a render service that relies on a postgres database. I began with a free service, and a starter database. Every request was taking a long time to load (around 5 seconds), but the metrics pages of both the service and the database were fine - CPU well below 50%, and memory around 70-80% on the DB.
Upgrading the service to a starter instance didn’t help. But then upgrading the database to a standard instance did.
Given that the metrics for the smaller services seemed to be fine, is there any way I could have known upgrading the service was the way to go? Is it possible that the metrics were getting closer to their limiting values, but that the level of granularity provided by render metrics was too low to identify those “peaks”?
It’s always hard to tell because there are so many components in an application or database instance that use resources differently. For example, databases generally perform better with more RAM, as Postgres can cache data in RAM rather than on SSD, making retrieval faster. But for other applications, it depends on the technology or programming language you’re using and how your application uses memory or CPU.
You can easily test web services by upgrading and downgrading the instance type with minimal cost impact, but that’s not as easy with databases since they can’t be downgraded.
As you mentioned, it’s also possible that our metrics don’t have enough granularity to catch sudden spikes in memory or CPU usage.