What is being done to stop outages and improve uptime?

The uptime (or lack thereof) makes it very difficult to rely on Render despite its convenience. What is being done to mitigate this going forward?

Will have no option but to migrate off, which would be a shame since it’s a joy to use.

3 Likes

Hi @Ibrahim_Irfan and others,

Over the last week, two of our upstream providers (Fastly and AWS) have both had outages, leading to the incidents you’ve seen. The incidents also affected a large part of the Internet, and providers like Heroku weren’t spared either.

The obvious solution to this is to build redundancy and high availability into the system; we could run your applications and sites across multiple CDNs and multiple availability zones. Besides the engineering work, this will also cost more, but we do plan to offer it as an option.

Beyond this, we’re also working to strengthen our systems against DDoS attacks and traffic spikes which have caused incidents in the past.

System reliability is our top priority: much more so than new features, and we take it incredibly seriously. We’re spending considerable engineering and financial resources to make sure we can get as close to 100% uptime as possible. It will take some time, but we’re determined to get there.

4 Likes

Hi, @anurag . Are there any updates here? I could be missing it, but I don’t see these options (multiple availability zones in the docs). It seems like it could be cool if there was a high availability option in the config, and if it was set, then the stack is automatically replicated in another availability zone (with auto-dns transfer on failover).

1 Like

If your app runs multiple instances, we try to put them on different AZs to the extent possible. This happens automatically; no configuration is needed from you (aside from scaling up or using autoscaling).

If your app runs multiple instances, we try to put them on different AZs to the extent possible. This happens automatically; no configuration is needed from you (aside from scaling up or using autoscaling).

That is great, but it also sponsors a couple follow up questions.

Is it visible to us which instances are where? Ideally we would have the majority of our instances in the zone that we initially selected as that is where our database is as well (3rd party MongoDB).

Will it be possible to see or change the load balancing that is being done?

Additionally, by coupling the redundancy with our scaling, we are forced to use the same size boxes. That is ok, but ideally we would use several high powered boxes in our chosen region (Ohio) and then have a single smaller box available in Oregon just in case… OR boot up identical boxes in Oregon but ONLY when Ohio has failed. Does that make sense?

Thanks

@anurag We are looking to move our ~20 servers from Heroku and would appreciate the chance to talk with someone on your team, could you please have the appropriate person reach out? Our trials have gone pretty smoothly but we have some follow on questions, including the above, and it would be great to get a dialog going.
Thanks

very happy to chat more and answer questions. would you mind emailing sales@render.com with your account email so we can start the process?

Thanks @anurag. I sent something last week and haven’t heard back yet.