Services stuck in unhealthy loop, exiting status code 1. DB server is intermittently down

I have two services that have been running for months fine, and the last time they were deployed was several days ago.

About an hour ago they both started failing. They both seem to be stuck in a loop where they fail, try to restart and fail again, constantly showing “Server unhealthy. Exited with status 1 while running an internal process.”. Looking at the metrics tab, everything looks to be well under their plan limits. No errors or anything amiss in my application logs.

In conjunction with that, I have been receiving the occasional error in my Sentry dashboard: “Can’t reach database server at ${db-service-id}”, which means the service can’t connect to my PG service. It should be noted that we do have a render forced database maintenance window scheduled in three days.

The website that is running these services as their backends still seems to be operational, which is a relief but still confusing.

I have reached out to render support, but have not heard back from them. Does anyone have any idea as to what could be going on, or how I could go about debugging the issue. My gut is telling me this is a render system level issue and not something to do with my application code, but would to do anything I can to make sure this is not the case.

1 Like

We are seeing this exact issue as well — both forms. We are seeing the “Server unhealthy. Exited with status 1 while running an internal process.” loop as well as sporadic network failures, all across the board (github.com, pypy, google maps API, stytch API, our Render database, etc.)

It’s very unclear what we can do on our end to resolve this. It seems like random network connectivity issues within Render’s infrastructure.

Makes me feel a little better I am not the only one. Our systems look to be operational again and healthy.

If anyone from Render is reading this, I totally get that these things happen, but it is absolutely essential that you guys update your status page as soon as something happens. I had to drop everything I was doing to investigate this, and the first thing I looked at was your status page to try and determine if this was an issue on my end. I see that you guys have now updated your status, but at the time it was green.

Hello,

We’re currently investigating an incident related to these Server unhealthy events. See https://status.render.com/incidents/zlpp8npgl50n for updates.

Best Regards,

Matt

My build deploys well, I’m using it, and after half day, sometimes sooner I start seeing all of these:

Server unhealthy
Exited with status 1 while running your code.
November 9, 2022 at 12:27 PM

Nothing more is said. The logs stop because of this.

This is an initial project for a client and I’m seeing if I can trust the uptime to use render. Is there a way that I can know what’s causing this state? Is the server too small? I’m blind trying to troubleshoot this, any takers?

If it is deploying fine and only exiting after running for a while, it could be a memory leak issue. Have you tried rolling back to a known good commit and seeing if it still runs? A while back I had this same issue, and it was due to a memory leak. In my case, I was running several thousand jobs that were calling async tasks but did not properly wrap them in async/await.

Hey Paul!

Thanks for helping! This is a simple Rails 7 app that does nothing but questionnaires when someone fills it. No one is using as it sits idle, no cron jobs, it’s simply waiting for the client to review the functionality. Is there a way to see the memory consumption?

@denisdaigle If it is a basic rails app as you describe, I doubt its a memory leak issue like what I described. Have you looked at the metrics tab in render to make sure you have enough resources allocated for your application? You may need to bump up to the next tier. Or it could def be on the render side, I just looked at status.render.com and it looks like the oregon datacenter is having issues, as well as some of their managed DB services. For what its worth, I am in the oregon DC and using both postgres and redis through render, and all our systems seem to be operational.

At this point I’m pretty convinced that it’s an issue with Render since it’s ok for a while (3 days), but then there’s a string of these down times, then it’s fine again. Not yet up to SLA guarantee standards, but if Render can be a good alt to Heroku then we can help ride out the bumps. I’m sure it’s infinitely more complex than we imagine for the staff right now.