We’re using render and we get very spikey load (think 600 requests/second). The service can handle it but the render service makes an outbound call to an external service. It seems like after a certain limit, those outbound requests fail.
Aug 24 05:18:15 PM {
Aug 24 05:18:15 PM “message”: “request to https://eth-rinkeby.alchemyapi.io/v2/123 failed, reason: connect ETIMEDOUT 3.208.5.125:443”,
Aug 24 05:18:15 PM “type”: “system”,
Aug 24 05:18:15 PM “errno”: “ETIMEDOUT”,
Aug 24 05:18:15 PM “code”: “ETIMEDOUT”
Aug 24 05:18:15 PM }
We assumed it was the service provider that couldn’t handle the load. However, after chatting with the service provider, it seems like some requests don’t even hit them and they haven’t throttled a single request. This makes me think maybe Render is throttling/blocking/timing out these requests. Could this be the case?
Just to make sure I understand, you say that requests are successful but after a time they start to fail? Do they eventually start working? What has your workaround been so far?
Yes. I’m load testing my server with ~300 requests per 30 seconds and after, say, 200 requests, they start to fail and time out.
They eventually start working again after some time (like a minute or two).
We have not found any work around.
We talked to the service provider and they do not see the requests, so I assume it’s either a Render or Node/Javascript/Library problem. I haven’t been able to deduce what the problem is.
As a follow up, does Render proactively block requests in an attempt to prevent DDoS attacks coming from Render? Could that be it?
As another data point point, I ran the same exact code locally and load tested it with a greater amount of requests per second and I never once got a timeout error from the service provider.
Obviously not an exact apples to apples but could this mean Render is throttling/blocking/timing out some of these requests?
EDIT: I lied! I’m actually getting timeouts locally now - but my localhost seems to be able to handle a greater amount of requests per second (like 2-3x renders)
+1 on this, we see the exact same behavior with external API calls. When hundreds/thousands of requests are made in a short amount of time, they fail with ETIMEDOUT
Running it locally fails much less than on Render
We’ve investigated Node memory limits, trying to use promise pools to reduce async requests, all to no luck
Thanks for bringing this to our attention @michael_winter and @Ibrahim_Irfan. @borko and I found that our GCP gateway was indeed dropping packets destined for eth-rinkeby.alchemyapi.io at the time @michael_winter observed this behavior. There’s a known issue with GCP gateways dropping packets when there are many simultaneous connections to the same address. We are planning to address this by reconfiguring our gateway. I can’t yet provide an estimate on when that change will ship, but we are planning to start working on it right away.
Meanwhile, if this continues to be a problem, we can help you migrate to our newer setup on AWS, where this should not be an issue. Any services created in Ohio, Frankfurt, or Singapore will use AWS. If you’d like to continue using Oregon, you can create a new team and its services should also use AWS.
Can you please help us migrate our current team to AWS? (We’d like to keep the same URLs/hostnames/etc).
It’s pretty bad/unacceptable that GCP drops packets. I understand it’s outside of your control but that’s a pretty egregious issue for us, since we’re dealing with financial/blockchain technology that requires consistency
Hi @michael_winter, I’m actually hopeful that we can ship the configuration change I mentioned very soon, so I’d recommend not migrating. Migrating between clusters is complicated and would involve downtime. Keeping the same onrender.com hostnames makes it trickier still.
We understand how disruptive this is for you, and appreciate your patience.
Hi @michael_winter and @Ibrahim_Irfan. We shipped that change are seeing much lower rates of dropped packets. Let us know if this fixes the issues you were seeing. Thanks again for bringing this to our attention and helping us make Render better!
Is it guaranteed that packets are now not dropped? We build crypto infrastructure and we need our packets not to get dropped or else we risk losing money
In general it’s not possible to guarantee that packets won’t be dropped. I can say that since we shipped this configuration change our GCP gateway hasn’t dropped any packets because of the issue I mentioned above. With this lower rate of dropped packets you should no longer see connection timeouts (the ETIMEDOUT errors you mentioned in your original report). TCP/IP automatically retransmits dropped packets, so packet loss only leads to connection errors when it happens at a high rate.
Given how critical these outbound requests are for you, I’d recommend that you add monitoring to your application so you can be notified when outbound requests are failing at an unacceptable rate (whatever that rate may be for you). If possible, you may want to retry on failures as well.