Gatsby - build caching and image transformations

I’ve just started using Render and it seems great! I have an issue with my Gatsby static site - i’m using gatsby-image to transform many images, and it takes a very long time. I’ve read cached images wouldn’t need to be regenerated if the .cache and public directories remained intact and when the GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES environment variable is set to true, but i’m getting a message:

We've detected that the Gatsby cache is incomplete (the .cache directory exists but the public directory does not). As a precaution, we're deleting your site's cache to ensure there's no stale data.

Is build caching in Gatsby in this way possible using Render?

Hello,

Sorry for the delay in responding. Render starts each build in a fresh environment, so there is no public dir present in the new build. A workaround would be to move the public dir out of the project dir so it will be added to the render cache, and then copy back to the project root on each new build. Note though, that the reason we don’t persist thinks like public is to ensure if you remove a file, it doesn’t get copied to the new build. At worst, you’d have to “Clear build cache & deploy” to remove the old public dir. Here is an example script:

#!/usr/bin/env bash

build_with_cache() {
  if [[ -d "$XDG_CACHE_HOME"/public ]]; then
    echo "Copying cached public dir"
    rsync -a "$XDG_CACHE_HOME"/public/ public
  else
    echo "No cached public dir found"
  fi

  echo "Building"

  gatsby build

  echo "Done, caching public dir"
  rsync -a public/ "$XDG_CACHE_HOME"/public
}

if [[ "$RENDER" ]]; then
  build_with_cache
else
  gatsby build
fi

Be sure to chmod u+x the file, and then replace the build command with ./cache.sh (for example if you name the script cache.sh

2 Likes

Thanks for this @Ralph ! Would the Render team be willing to add this to their docs as a fully-fledged guide?

1 Like

Hi @karlhorky-upleveled ,

I think that’s a great idea. We’ve got a lot of docs we want to add, so I can’t add it right now, but I’ve added this to our list :slight_smile:

2 Likes

Just trying this out today (although I’m experiencing unexplained build failures). But I think the cache saving and restoration is working…?

Here’s my version of the script:

#!/usr/bin/env bash

# Ref: https://community.render.com/t/gatsby-build-caching-and-image-transformations/129/2

restore_render_cache() {
  local source_cache_dir="$1"
  if [[ -d "$XDG_CACHE_HOME/$source_cache_dir" ]]; then
    echo "CACHE HIT $source_cache_dir, rsyncing..."
    rsync -a "$XDG_CACHE_HOME/$source_cache_dir/" $source_cache_dir
  else
    echo "CACHE MISS $source_cache_dir"
  fi
}

save_render_cache() {
  local source_cache_dir="$1"
  echo "CACHE SAVE $source_cache_dir, rsyncing..."
  mkdir -p "$XDG_CACHE_HOME/$source_cache_dir"
  rsync -a $source_cache_dir/ "$XDG_CACHE_HOME/$source_cache_dir"
}

install_and_build_with_cache() {
  restore_render_cache "node_modules"
  yarn --frozen-lockfile --production
  save_render_cache "node_modules"

  restore_render_cache ".cache"
  restore_render_cache "public"
  export GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true
  yarn gatsby build
  save_render_cache ".cache"
  save_render_cache "public"
}

install_and_build_with_cache

@Ralph @dan Should my script above be working? Even though it says that it’s saving to the cache (cache should be probably over 1GB), it only downloads a minimal cache (624MB, only including the Yarn cache) and results in only cache misses (after subsequent builds).

The service id is srv-brua640951caka6i3sd0

First build:

Jan 7 09:45:02 AM  CACHE SAVE node_modules, rsyncing...

Subsequent build:

Jan 7 09:51:54 AM  ==> Downloaded 624MB in 10s. Extraction took 22s.
...
Jan 7 09:52:10 AM  RENDER CACHE MISS node_modules

Hey @karlhorky-upleveled, That script looks like it should work as expected. We cache the node_modules directory by default so it can be removed from the script. Caching the public directory is non-standard, which is why it has to be moved to $XDG_CACHE_HOME explicitly.

I tried with a pared-down example:

#!/usr/bin/env bash

# Ref: https://community.render.com/t/gatsby-build-caching-and-image-transformations/129/2

restore_render_cache() {
  local source_cache_dir="$1"
  if [[ -d "$XDG_CACHE_HOME/$source_cache_dir" ]]; then
    echo "CACHE HIT $source_cache_dir, rsyncing..."
    rsync -a "$XDG_CACHE_HOME/$source_cache_dir/" $source_cache_dir
  else
    echo "CACHE MISS $source_cache_dir"
    echo "Creating empty dir"
    mkdir $source_cache_dir
  fi
}

save_render_cache() {
  local source_cache_dir="$1"
  echo "CACHE SAVE $source_cache_dir, rsyncing..."
  mkdir -p "$XDG_CACHE_HOME/$source_cache_dir"
  rsync -a $source_cache_dir/ "$XDG_CACHE_HOME/$source_cache_dir"
}

install_and_build_with_cache() {
  restore_render_cache ".cache"
  restore_render_cache "public"

  echo ".cache contents"
  cat .cache/*
  echo "public contents"
  cat public/*


  echo "Writing files"

  echo $(date) >> .cache/somefile
  echo $(date) >> public/log.txt

  save_render_cache ".cache"
  save_render_cache "public"
}

install_and_build_with_cache

After a few deploys I see:

Jan 7 03:13:26 PM  ==> Running build command './cache.sh'...
Jan 7 03:13:26 PM  CACHE HIT .cache, rsyncing...
Jan 7 03:13:26 PM  CACHE HIT public, rsyncing...
Jan 7 03:13:26 PM  .cache contents
Jan 7 03:13:26 PM  Fri Jan 7 21:05:14 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:06:35 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:08:39 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:11:25 UTC 2022
Jan 7 03:13:26 PM  public contents
Jan 7 03:13:26 PM  <!DOCTYPE html>
Jan 7 03:13:26 PM  <html>
Jan 7 03:13:26 PM    <body>
Jan 7 03:13:26 PM      <h1>This is just a test</h1>
Jan 7 03:13:26 PM    </body>
Jan 7 03:13:26 PM  </html>
Jan 7 03:13:26 PM  Fri Jan 7 21:11:25 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:05:14 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:06:35 UTC 2022
Jan 7 03:13:26 PM  Fri Jan 7 21:08:39 UTC 2022
Jan 7 03:13:26 PM  Writing files
Jan 7 03:13:26 PM  CACHE SAVE .cache, rsyncing...
Jan 7 03:13:26 PM  CACHE SAVE public, rsyncing...
Jan 7 03:13:41 PM  ==> Uploading build...
Jan 7 03:13:43 PM  ==> Your site is live 🎉

So it looks like the script functionality works as expected. One thing to not is that if the build is canceled or fails it might not get to the point where it uploads the cache for later reuse.

How can I verify this? If I open a shell, I can’t see node_modules in the $XDG_CACHE_HOME location. Or is it only there during build and then removed from the final environment where the shell is started?

Render’s infrastructure around uploading and deploying is still somewhat of a mystery to me…

Another detail: I have recursive node_modules in my version (eg. I cache all packages/*/node_modules directories, since I’m using Yarn Workspaces). See the script below. I didn’t post it in the original script since it probably isn’t useful to everyone.

Show script
#!/usr/bin/env bash

# Exit if any command exits with a non-zero exit code
set -e

# Caching code courtesy of https://community.render.com/t/gatsby-build-caching-and-image-transformations/129/2

restore_render_cache() {
  local source_cache_dir="$1"
  if [[ -d "$XDG_CACHE_HOME/$source_cache_dir" ]]; then
    echo "CACHE HIT $source_cache_dir, rsyncing..."
    rsync -a "$XDG_CACHE_HOME/$source_cache_dir/" $source_cache_dir
  else
    echo "CACHE MISS $source_cache_dir"
  fi
}

export -f restore_render_cache

save_render_cache() {
  local source_cache_dir="$1"
  echo "CACHE SAVE $source_cache_dir, rsyncing..."
  mkdir -p "$XDG_CACHE_HOME/$source_cache_dir"
  rsync -a $source_cache_dir/ "$XDG_CACHE_HOME/$source_cache_dir"
}

export -f save_render_cache

install_and_build_with_cache() {
  restore_render_cache "node_modules"
  # All node_modules dirs at all depths, except for in Yarn cache dir
  find "$XDG_CACHE_HOME" -type d \( -name 'node_modules' -and -not -path "$(yarn cache dir)*" \) -prune -exec /bin/bash -c '
    for path in "$@" ; do
      restore_render_cache "${path/$XDG_CACHE_HOME\//}"
    done
  ' {} +

  yarn --frozen-lockfile --production

  save_render_cache "node_modules"
  # All node_modules dirs at all depths
  find . -name 'node_modules' -type d -prune -exec /bin/bash -c '
    for path in "$@" ; do
      save_render_cache "${path/.\//}"
    done
  ' {} +

  restore_render_cache "packages/website/.cache"
  restore_render_cache "packages/website/public"

  yarn gatsby build

  save_render_cache "packages/website/.cache"
  save_render_cache "packages/website/public"
}

install_and_build_with_cache

I’ll try removing just the node_modules in the root, because I assume Render doesn’t cache all the recursive node_modules directories?

I’m assuming this because when I run yarn install --frozen-lockfile --production it still takes about 140 seconds:

# yarn install starts here
Jan 8 03:10:58 PM   [1/5] Validating package.json...
Jan 8 03:10:58 PM   [2/5] Resolving packages...
...
Jan 8 03:10:59 PM   [3/5] Fetching packages...
...
Jan 8 03:11:01 PM   [4/5] Linking dependencies...
...
Jan 8 03:13:23 PM   [5/5] Building fresh packages...
# yarn install done here

With proper caching of all node_modules folders (also recursively) I’ve seen yarn install take only 0-2 seconds.

I tried removing the rsync of only the root node_modules folder, and this seemed to help (no more deploy failures with “Cause of build failure could not be determined”).

Maybe the above error is actually a bug in Render’s deploy scripts: try rsyncing node_modules to $XDG_CACHE_HOME and watch all Render deploys fail.


However, removing rsync for the root node_modules didn’t help with the cache misses:

Jan 8 12:39:45 PM  CACHE MISS packages/website/.cache
Jan 8 12:39:45 PM  CACHE MISS packages/website/public

It seems like there’s nothing in the cache actually…

Jan 8 12:35:00 PM  ==> Downloading cache...
Jan 8 12:35:07 PM  ==> Detected Node version 16.2.0

@Ralph is it a problem if I’m rsyncing deep directories recursively like this? Or is it an issue that the paths in $XDG_CACHE_HOME partially match the paths on disk? (for example, packages/website is a folder in the Git repo, and $XDG_CACHE_HOME/packages/website/node_modules are the node modules cached for that Yarn workspace)

One potential solution would be for the save_render_cache function to change the slashes in the paths to some other separator (eg. changing / to - or something), to avoid these problems. And then for restore_render_cache to convert back to slashes. But I would only go ahead with such a solution if you can confirm that these scenarios I described above are problematic.


Edit: Ah interesting, it seems that if I remove all the references to the recursive node_modules, then after 2 or 3 deploys the Gatsby cache starts hitting instead of missing (see below). I’m not sure why it’s not instant though - do you have any clue here? Maybe Render cache takes a while to warm up / be ready?

Jan 8 03:13:34 PM   CACHE HIT packages/website/.cache, rsyncing...
Jan 8 03:13:45 PM   CACHE HIT packages/website/public, rsyncing...

@Ralph this seems like a common problem with using $XDG_CACHE_HOME on Render’s infrastructure - just ran into it again now.

First 4 deploys were missing the folder in the $XDG_CACHE_HOME, although I rsynced the folder there after the build. Then on the 5th or so deploy, it appeared.

Since the “Render build cache” seems to be pretty undocumented, could you expand on how it works here and potentially help out with a solution for this problem?

Second question about the build cache: can I sync things there after the initial build as well? Eg. once the server is running, can I rsync new data there? Or will it disappear? It’s really hard to debug this because of the lack of documentation and the weird inconsistent behavior described above.

Lastly, during deployment, is there any other way to get access to the files that are in the previous deployment? (the files in the running instance)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.