I just migrated a couple of database (primary + replica) from Heroku and I’m facing with errors I never met on Heroku (I was also running on Postgres 11).
Some long running queries fail with:
** ERROR 40001 (serialization_failure) canceling statement due to conflict with recovery
User query might have needed to see row versions that must be removed.
I guess some replica settings are somewhat different between Heroku replica and yours, but I’m no DB expert. Can you help?
Thanks for letting us know about this! We took a look at the replication settings, and did find some ways we could better tune it (and other replicas, too) to alleviate this problem.
I’ve made the configuration changes on your database. Can you let me know if the problem is still occurring, or if things look good to you now?
Our nightly job ran successfully 3 times in a row without any incident, so I can assume it is fixed now.
For my knowledge, can you tell me what configuration changes you applied on my DB?
Glad to hear it! We tuned the max_standby_archive_delay and max_standby_streaming_delay postgres config parameters to allow for longer-running queries on replicas. We were initially conservative in the max delays, which would mean that long-running queries wouldn’t delay replication for newer queries, but that results in the query errors you were seeing.