How to debug issues with deploy-time health check?

jrr · May 23, 2023, 7:09pm

Lately some our deploys seem to get stuck on the deploy-time health-check, e.g.:

Waiting for internal health check to return a successful response code at: foo-bar.onrender.com:43799/api/health-check

Usually a “Clear build cache & deploy” seems to unstick it.

I’d like to get a better understanding of what’s going on, and I’m not sure where to start.

Note also that Render lists “internal addresses” of e.g.:

TCP foo-bar:10000
HTTP foo-bar:43799

A couple specific questions:

How can I run a representative health-check myself? (curl the foo-bar.onrender.com:43799 address from the dashboard’s shell? Or maybe foo-bar:43799? Will that talk to the old service or the new candidate?)
How can I view the application log from the new candidate version? (the one that seems to not be responding to health checks?)

In general it feels like there’s not great visibility into the deployment process. A few ideas for improvements to the dashboard:

Display a log of health check attempts. (could include time, URL, response, and the application log from the target service)
Display more information for each deployment attempt: How long did the build take? What was the size of the cache loaded? What was the size of the cache saved? What was the size of the produced slug?

Thanks.

Jade_Paoletta · May 23, 2023, 10:23pm

Hi John,

Thanks for reaching out here. We have come across a bug on some more recent versions of next.js that causes an additional, random port to be exposed on the service. It is similar to the issue described here: https://github.com/vercel/next.js/issues/49677.

This has caused some issues with deloys when the health check attempts to connect on this port. In my testing, the issue can be reproduced on Render on Next version 13.4. I did not see the issue when downgrading to 13.3, but some users on that thread still reference the issue on 13.3. To get around this, you can try downgrading your Next.js version. Alternatively, setting a PORT environment variable on your service with the value set to 10000 will skip the port detection process altogether and manually defines 10000 as the port to serve public traffic from.

Regarding your questions around the health check, this is an internal check, so curling the internal host:port address from the shell of another service will mimic this. However, this would connect to the live instance of your service, which would not be the new instance of your service spinning up during a deploy.

Service logs that are emitted from the new version your application should also appear in the deploy logs, before the service is marked as live. So if you’ve configured your service to emit a log for requests to your health check path, they should appear there.

I definitely agree with you that there’s a lot of room for improvement around visibility during deploys, and I think you bring up some valid feedback. I’m happy to relay this to our product team, and would also encourage you to submit this feedback via our feature request board: https://feedback.render.com/features so that we can prioritize it accordingly.

jrr · May 26, 2023, 4:31pm

Thanks, Jade! The PORT env var has cleared up our flaky deploys.

Added a feature request for more deploy info here: Detailed build/deploy logs | Feature Requests | Render

system · June 25, 2023, 4:32pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Endless "in progress" deployment	4	844	October 19, 2022
Deployment fails with no errors but works after clearing build cache	2	656	October 2, 2022
Next.js - app stuck at "listening xxx" / healthz not called	5	2002	June 24, 2021
Build stuck on deploying (manual, auto-deploy)	2	793	October 27, 2023
Debug server wasn't able to start properly	5	1125	October 14, 2021

How to debug issues with deploy-time health check?

Related topics