Lately some our deploys seem to get stuck on the deploy-time health-check, e.g.:
Waiting for internal health check to return a successful response code at: foo-bar.onrender.com:43799/api/health-check
Usually a “Clear build cache & deploy” seems to unstick it.
I’d like to get a better understanding of what’s going on, and I’m not sure where to start.
Note also that Render lists “internal addresses” of e.g.:
TCP foo-bar:10000
HTTP foo-bar:43799
A couple specific questions:
How can I run a representative health-check myself? (curl the foo-bar.onrender.com:43799 address from the dashboard’s shell? Or maybe foo-bar:43799? Will that talk to the old service or the new candidate?)
How can I view the application log from the new candidate version? (the one that seems to not be responding to health checks?)
In general it feels like there’s not great visibility into the deployment process. A few ideas for improvements to the dashboard:
Display a log of health check attempts. (could include time, URL, response, and the application log from the target service)
Display more information for each deployment attempt: How long did the build take? What was the size of the cache loaded? What was the size of the cache saved? What was the size of the produced slug?
Thanks for reaching out here. We have come across a bug on some more recent versions of next.js that causes an additional, random port to be exposed on the service. It is similar to the issue described here: https://github.com/vercel/next.js/issues/49677.
This has caused some issues with deloys when the health check attempts to connect on this port. In my testing, the issue can be reproduced on Render on Next version 13.4. I did not see the issue when downgrading to 13.3, but some users on that thread still reference the issue on 13.3. To get around this, you can try downgrading your Next.js version. Alternatively, setting a PORT environment variable on your service with the value set to 10000 will skip the port detection process altogether and manually defines 10000 as the port to serve public traffic from.
Regarding your questions around the health check, this is an internal check, so curling the internal host:port address from the shell of another service will mimic this. However, this would connect to the live instance of your service, which would not be the new instance of your service spinning up during a deploy.
Service logs that are emitted from the new version your application should also appear in the deploy logs, before the service is marked as live. So if you’ve configured your service to emit a log for requests to your health check path, they should appear there.
I definitely agree with you that there’s a lot of room for improvement around visibility during deploys, and I think you bring up some valid feedback. I’m happy to relay this to our product team, and would also encourage you to submit this feedback via our feature request board: https://feedback.render.com/features so that we can prioritize it accordingly.