Is there any guidance regarding how to configure the number of workers per instance for web services? Should I be doing something clever like detect the number of CPU cores or use some fixed number? Or just have a single worker per instance?
I’m asking specifically for the case of a dockerized Python app, but guidance could apply to any stack.
Every app is different and will have different resource requirements, the only true way to optimize the worker count would be to monitor the metrics. Load testing can also help to gauge those limits.
I’d suggest starting low and increasing if you find you have resource headroom. Setting the worker count as an environment variable and referencing that in the code will give you flexibility to change it without having to make any further code commits.