Our Redis instance red-c8mffn50mal9damtcgkg has been going from unavailable to available and back for hours now…
Right now our app is completely down because of that.
The support chat says “we’ll be back on Monday” which is not really great.
Could someone please help us asap? No reply from support yet…
It seems to have stabilized now, after being flaky for 12+ hours…
Update: unfortunately it’s still going on…
Despite silence here we’ve been engaged in conversation with @manuelmeurer via a support ticket that was also opened around this.
Just to close out this thread here, here’s what happened:
Our investigation has revealed that on Friday and Saturday an autoscaler that we use to ensure that we have enough capacity in our regions behaved in an unexpected way in our Frankfurt region and was incorrectly scaling up and down the nodes that our Redis instances run on. This caused customer services to be incorrectly moved between nodes and would exhibit the behaviour you witnessed here. Actions have been taken to ensure that this behaviour does not repeat.
We also had a really good chat around support and what should be expected when things go bad like this. We will do better!