We’ve been deploying our services many times a day recently and we’ve started to see random build failures from Docker’s buildkit with error messages like these:
May 5 02:49:34 PM ==> Downloading cache...
May 5 02:49:56 PM ==> Downloaded 1.4GB in 18s. Extraction took 2s.
May 5 02:50:28 PM could not connect to unix:///run/user/1000/buildkit/buildkitd.sock after 30 trials
May 5 02:50:28 PM ========== log ==========
May 5 02:50:28 PM time="2022-05-05T21:50:19Z" level=info msg="auto snapshotter: using overlayfs"
May 5 02:50:28 PM time="2022-05-05T21:50:19Z" level=warning msg="NoProcessSandbox is enabled. Note that NoProcessSandbox allows build containers to kill (and potentially ptrace) an arbitrary process in the BuildKit host namespace. NoProcessSandbox should be enabled only when the BuildKit is running in a container as an unprivileged user."
May 5 03:18:37 PM ==> Downloading cache...
May 5 03:19:01 PM ==> Downloaded 1022MB in 15s. Extraction took 5s.
May 5 03:20:05 PM could not connect to unix:///run/user/1000/buildkit/buildkitd.sock after 30 trials
May 5 03:20:05 PM ========== log ==========
May 6 07:43:26 AM #12 [build-env 6/6] COPY . ./
May 6 07:43:26 AM #12 ERROR: failed to copy: rpc error: code = Internal desc = unexpected EOF
May 6 07:43:26 AM ------
May 6 07:43:26 AM > [build-env 6/6] COPY . ./:
May 6 07:43:26 AM ------
May 6 07:43:26 AM Dockerfile.render:37
May 6 07:43:26 AM --------------------
May 6 07:43:26 AM 35 | && find vendor/bundle/ruby/3.0.0/gems/ -name "*.c" -delete \
May 6 07:43:26 AM 36 | && find vendor/bundle/ruby/3.0.0/gems/ -name "*.o" -delete
May 6 07:43:26 AM 37 | >>> COPY . ./
May 6 07:43:26 AM 38 |
May 6 07:43:26 AM 39 |
May 6 07:43:26 AM --------------------
May 6 07:43:26 AM error: failed to solve: Internal: unexpected EOF
Retrying the deploy/build always works so these errors all seem to be temporary issues.
Could these temporary build failures be retried automatically?
Thanks,
Jason