What is being done to stop outages and improve uptime?

Ibrahim_Irfan · June 10, 2021, 9:46pm

The uptime (or lack thereof) makes it very difficult to rely on Render despite its convenience. What is being done to mitigate this going forward?

Will have no option but to migrate off, which would be a shame since it’s a joy to use.

anurag · June 11, 2021, 1:15am

Hi @Ibrahim_Irfan and others,

Over the last week, two of our upstream providers (Fastly and AWS) have both had outages, leading to the incidents you’ve seen. The incidents also affected a large part of the Internet, and providers like Heroku weren’t spared either.

The obvious solution to this is to build redundancy and high availability into the system; we could run your applications and sites across multiple CDNs and multiple availability zones. Besides the engineering work, this will also cost more, but we do plan to offer it as an option.

Beyond this, we’re also working to strengthen our systems against DDoS attacks and traffic spikes which have caused incidents in the past.

System reliability is our top priority: much more so than new features, and we take it incredibly seriously. We’re spending considerable engineering and financial resources to make sure we can get as close to 100% uptime as possible. It will take some time, but we’re determined to get there.

samlauff · August 23, 2022, 6:38pm

Hi, @anurag . Are there any updates here? I could be missing it, but I don’t see these options (multiple availability zones in the docs). It seems like it could be cool if there was a high availability option in the config, and if it was set, then the stack is automatically replicated in another availability zone (with auto-dns transfer on failover).

anurag · August 23, 2022, 8:34pm

If your app runs multiple instances, we try to put them on different AZs to the extent possible. This happens automatically; no configuration is needed from you (aside from scaling up or using autoscaling).

Rob_Witman · October 19, 2022, 8:50pm

If your app runs multiple instances, we try to put them on different AZs to the extent possible. This happens automatically; no configuration is needed from you (aside from scaling up or using autoscaling).

That is great, but it also sponsors a couple follow up questions.

Is it visible to us which instances are where? Ideally we would have the majority of our instances in the zone that we initially selected as that is where our database is as well (3rd party MongoDB).

Will it be possible to see or change the load balancing that is being done?

Additionally, by coupling the redundancy with our scaling, we are forced to use the same size boxes. That is ok, but ideally we would use several high powered boxes in our chosen region (Ohio) and then have a single smaller box available in Oregon just in case… OR boot up identical boxes in Oregon but ONLY when Ohio has failed. Does that make sense?

Thanks

Rob_Witman · October 21, 2022, 1:32pm

@anurag We are looking to move our ~20 servers from Heroku and would appreciate the chance to talk with someone on your team, could you please have the appropriate person reach out? Our trials have gone pretty smoothly but we have some follow on questions, including the above, and it would be great to get a dialog going.
Thanks

anurag · October 21, 2022, 2:50pm

very happy to chat more and answer questions. would you mind emailing sales@render.com with your account email so we can start the process?

Rob_Witman · October 25, 2022, 9:04pm

Thanks @anurag. I sent something last week and haven’t heard back yet.

Topic		Replies	Views
What's next after multiple outages in 2024?	10	885	May 12, 2024
Is render a stable platform?	2	744	June 22, 2021
How can I tell if my app is on aws or gcp?	7	2397	October 20, 2022
Multi-tenant vs single tenant	1	701	August 21, 2021
Render is down, is there an incident?	32	1034	April 29, 2024

What is being done to stop outages and improve uptime?

Related topics