Private network downtime during deploy

nordbear · June 15, 2023, 5:15pm

Hello! Recently noticed that it wasn’t possible to achieve zero downtime deploys for private services due to there being no health checks there. I tried to get around this by having a web service listening on two ports, 10000 for public facing and 8080 for private traffic. So for example, I would expose the health check on port 10000 and the rest of the API on 8080 for private traffic only.

The above works fine but I noticed that during a deploy the private end point would be down for a couple of seconds anyway, even though it waited for the health check on the public port.

Any idea if it’s possible to get around this some way? Not sure what the point of the private network is if services cant be deployed without downtime, so it feels like there should be some way of solving this. I realize that my approach might be a bit of a hack but it would be really nice to have this working.

nordbear · June 17, 2023, 6:52pm

Did some more testing now and actually managed to get it working using both a web and private service. What I did was to add a timeout after the sigterm handler which would then keep the app-to-be-killed running for a while before shutting down. I set it to 20s just to test and it seemed to have done the trick. I assume the load balancer sends a couple of requests to the old service for some reason, and without the timeout they would not be served(?).

Would really appreciate some official feedback on this anyway to understand if this is a viable approach or not. And if it is, maybe it should be documented somewhere. Overall it would be great with a bit indepth info on how the health checks and load balancer are set up, especially when it comes to deploys.

Thx!

mmaddex · June 20, 2023, 8:31pm

Hi there,

We do have some documentation on this process at https://render.com/docs/deploys#how-render-uses-health-check-paths.

Let me know if there are more specific questions you still have not answered by that page and I’d be happy to see about getting it updated.

Regards,

Matt

nordbear · June 21, 2023, 7:31am

Hey! I think my main concern is that it seems like there is a downtime when connecting to a service (web or private) over the private network, even with a health check set up. Not sure if that’s a bug or not, but like I wrote above I managed to fix it by delaying the sigterm handler. Is this a viable approach?

Reg the health checks docs, it’s not really clear how the load balancer directs the traffic during a deploy. It appears like it still sends a few requests to the old service even though the new one has a succesfull health check (according to my tests above). I don’t really mind this but if it’s necessary to delay the sigterm handler, like you would usually do in a k8s cluster, I think it should be written or explained somewhere

system · July 21, 2023, 7:32am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deploy of private service stuck	2	476	September 24, 2021
Health endpoint for private services	3	764	March 1, 2022
Deployment via Blueprint occasionally times out	4	347	July 20, 2023
Private service, deploy failed (but keeps working fine)	1	658	May 18, 2021
Zero-downtime sane defaults	1	444	January 28, 2022

Private network downtime during deploy

Related topics