Docker service keeps rerunning `dockerCommand` indefinitely

axwalker · May 5, 2021, 12:38pm

Problem

I’m trying to start a docker service for Django. I have a problem with dockerCommand running over and over again indefinitely and my app never actually being available.

My dockerCommand runs a build script currently containing:

poetry run ./manage.py migrate
poetry run gunicorn -b 0.0.0.0:8000 ibproduct.wsgi:application

This results in the two commands just running over and over again, with the deploy stuck on ‘In progress’.

I’ve tried upgrading my plan to starter plus after reading in another issue it could be a memory problem. In this case, the commands don’t visibly repeat but the deploy still gets stuck on ‘In progress’ and the app isn’t working.

Questions:

Any thoughts on why this app isn’t working?
How does render decide a deploy is ‘finished’? After the dockerCommand is started and once the healthCheckUrl returns 200?
Is it expected behaviour for dockerCommand to retry multiple times under some conditions?

Possibly related issues

Node web service keeps restarting

Thanks for any help you can provide.

axwalker · May 5, 2021, 1:14pm

I’ve been trying to get gunicorn to run just in the render shell. It work fine within the same docker image on my local machine, but on render the following happens:

root@backend-vjls-shell:/usr/apppoetry run gunicorn -b 127.0.0.1:9000 ibproduct.wsgi
[2021-05-05 13:05:50 +0000] [164] [INFO] Starting gunicorn 20.1.0
[2021-05-05 13:05:50 +0000] [164] [INFO] Listening at: http://127.0.0.1:9000 (164)
[2021-05-05 13:05:50 +0000] [164] [INFO] Using worker: sync
[2021-05-05 13:05:50 +0000] [170] [INFO] Booting worker with pid: 170
[2021-05-05 13:05:50 +0000] [171] [INFO] Booting worker with pid: 171
[2021-05-05 13:05:50 +0000] [172] [INFO] Booting worker with pid: 172
[2021-05-05 13:05:50 +0000] [173] [INFO] Booting worker with pid: 173
[2021-05-05 13:06:15 +0000] [186] [INFO] Booting worker with pid: 186
[2021-05-05 13:06:21 +0000] [164] [WARNING] Worker with pid 178 was terminated due to signal 9
[2021-05-05 13:06:21 +0000] [164] [WARNING] Worker with pid 171 was terminated due to signal 9
[2021-05-05 13:06:21 +0000] [189] [INFO] Booting worker with pid: 189
[2021-05-05 13:06:21 +0000] [190] [INFO] Booting worker with pid: 190
[2021-05-05 13:06:29 +0000] [164] [WARNING] Worker with pid 183 was terminated due to signal 9
[2021-05-05 13:06:29 +0000] [193] [INFO] Booting worker with pid: 193

Seems like the workers keep getting killed for some reason? I tried to curl my app within the render shell and just get an empty response. If I do the same without gunicorn - ./manage.py runserver - then I can query the app with curl within the render shell at least.

jake · May 5, 2021, 2:52pm

Hey Andrew,

The fact that the command is getting repeatedly restarted makes me think this might be a memory issue. Since the command doesn’t restart on the higher plan, I suspect that is indeed the case and there is some other issue that is causing it to get stuck in progress. Can you share the service ID for this service?

How does render decide a deploy is ‘finished’? After the dockerCommand is started and once the healthCheckUrl returns 200 ?

That is correct. Render will consider your app live when it responds with a 200 for the healthcheck path.

axwalker · May 5, 2021, 2:55pm

Thanks for getting back to me.

I’m assuming this is the ID (taken from the URL): srv-c297k23onml6gjf2mnm0

I’m fine with us on a higher plan - we’ll probably want to increase that again relatively soon anyway.

Do you know anything about why I might be having the gunicorn issues still on the higher plan even?

jake · May 5, 2021, 3:16pm

The issue while on the higher plan may have been a problem on our end. Could you try triggering another deploy now?

jake · May 5, 2021, 3:30pm

I’m seeing that the healthcheck path /stage/admin/login/ is returning a 500 response. This seems to indicate that the server is now up and responding to requests but hitting an error for that path. It won’t be marked as live as a result.

Do you have any insight into why that would return a 500?

jake · May 5, 2021, 3:55pm

I also just noticed the reference to 127.0.0.1 in your earlier post. You’ll want to make sure you’re listening on 0.0.0.0 rather than localhost.

axwalker · May 5, 2021, 4:23pm

I’ve been able to identify what the 500 error was and have fixed that for the health check. So now the deploy ends up at ‘live’.

However, if you actually go to that health check url (or any other path) after the deploy is live, it just returns a 502 error. Do you know why this could be?

axwalker · May 5, 2021, 4:31pm

Note that in the logs I’m seeing the health check coming through with 200:

May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [25] [INFO] Starting gunicorn 20.1.0
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [25] [INFO] Listening at: http://0.0.0.0:8000 (25)
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [25] [INFO] Using worker: sync
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [31] [INFO] Booting worker with pid: 31
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [32] [INFO] Booting worker with pid: 32
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [33] [INFO] Booting worker with pid: 33
May 5 05:28:45 PM  [2021-05-05 16:28:45 +0000] [34] [INFO] Booting worker with pid: 34
May 5 05:28:49 PM  [05/May/2021 16:28:49] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:28:54 PM  [05/May/2021 16:28:54] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:28:59 PM  [05/May/2021 16:28:59] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:29:04 PM  [05/May/2021 16:29:04] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:29:09 PM  [05/May/2021 16:29:09] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:29:14 PM  [05/May/2021 16:29:14] "GET /stage/admin/login/ HTTP/1.1" 200 1957
May 5 05:29:19 PM  [05/May/2021 16:29:19] "GET /stage/admin/login/ HTTP/1.1" 200 1957

But I never even see the requests coming from me visiting the actual URL in the browser.

jake · May 5, 2021, 10:04pm

Your service seems to be up now. The issue on our end I referenced earlier is related to overriding the port with the PORT env var after the service is created. I failed to completely address it initially. Once the service was up and getting 502s I was able to identify the root cause and fully fix it.

I am going to see if we can get this issue prioritized so it doesn’t cause problems in the future. Let me know if you have any more issues.

axwalker · May 6, 2021, 8:46am

Great, thanks for all your help.

axwalker · May 6, 2021, 9:32am

Unfortunately now all my builds for this service are just hanging on ‘In progress’ with nothing to show in the logs. Any idea what’s happening?

arunesh90 · May 6, 2021, 9:44am

Yeah having the same issue also over here with my builds just hanging on “In progress”

axwalker · May 6, 2021, 9:45am

Is your service also a docker web service?

arunesh90 · May 6, 2021, 9:48am

It’s a web application/server in a Docker image, if that’s what you mean

arunesh90 · May 6, 2021, 9:56am

Getting 502s also on another Render project, all this while the status page says there’s no problems… heh

anurag · May 6, 2021, 1:32pm

Sean_Doughty · May 7, 2021, 1:02am

All our systems have now been restored and we are monitoring to ensure everything stays healthy. Please let us know if you continue to see issues.

We will publish a postmortem after we conduct a full root cause analysis. Let us know if there’s anything else we can do to earn back your confidence. We know we fell short today, but we are working all the time towards higher reliability. It’s our top priority as a company.

I’m very sorry for the extended disruption.

Topic		Replies	Views
Docker/Flask/Gunicorn app continues to boot workers indefinitely	3	659	January 26, 2022
(render.com) Booting worker with pid:<int>	2	664	June 17, 2023
Failed deploy after pushing image to registry	3	776	September 24, 2022
Unable to deploy app	2	1423	March 9, 2023
Build stuck in ' Starting service with 'node dist/server.js''	6	1185	December 28, 2020

Docker service keeps rerunning `dockerCommand` indefinitely

Related topics