A background worker of mine that has been working fine for weeks gave me a notification last night that it “failed”. The logs up into that failure time look 100% normal, and then show the server re-starting.
Any ideas as to how I can try to figure out what happened?
Thanks for reaching out.
Looking at your Background Worker, it shows “Server unhealthy” in the Events tab, which correlates to RAM metrics showing as higher than the plan limit at that time.
The email notification could certainly be better worded/more explicit and I’ll raise that with the team.
Thanks, that at least explains what happened! I definitely would appreciate more clarity in the email.
Did something recently change in the RAM monitoring? Because ever since this first occurrence, I now get this notice virtually every time I’m pushing a new release (which understandably comes with a doubling of memory usage).
I’m not aware of any changes to the RAM metrics, but I’m also not seeing a direct correlation to deploys, there are some spikes/unhealthy events from around 15 hours ago that didn’t start until almost an hour after the previous deploy.
Maybe the jobs now require more resources? Has the number of jobs that are being processed increased? Maybe your Celery concurrency settings are set too high for the plan?