I’ve just started using Render and it seems great! I have an issue with my Gatsby static site - i’m using gatsby-image to transform many images, and it takes a very long time. I’ve read cached images wouldn’t need to be regenerated if the .cache and public directories remained intact and when the GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES environment variable is set to true, but i’m getting a message:
We've detected that the Gatsby cache is incomplete (the .cache directory exists but the public directory does not). As a precaution, we're deleting your site's cache to ensure there's no stale data.
Is build caching in Gatsby in this way possible using Render?
Sorry for the delay in responding. Render starts each build in a fresh environment, so there is no public dir present in the new build. A workaround would be to move the public dir out of the project dir so it will be added to the render cache, and then copy back to the project root on each new build. Note though, that the reason we don’t persist thinks like public is to ensure if you remove a file, it doesn’t get copied to the new build. At worst, you’d have to “Clear build cache & deploy” to remove the old public dir. Here is an example script:
#!/usr/bin/env bash
build_with_cache() {
if [[ -d "$XDG_CACHE_HOME"/public ]]; then
echo "Copying cached public dir"
rsync -a "$XDG_CACHE_HOME"/public/ public
else
echo "No cached public dir found"
fi
echo "Building"
gatsby build
echo "Done, caching public dir"
rsync -a public/ "$XDG_CACHE_HOME"/public
}
if [[ "$RENDER" ]]; then
build_with_cache
else
gatsby build
fi
Be sure to chmod u+x the file, and then replace the build command with ./cache.sh (for example if you name the script cache.sh
@Ralph@dan Should my script above be working? Even though it says that it’s saving to the cache (cache should be probably over 1GB), it only downloads a minimal cache (624MB, only including the Yarn cache) and results in only cache misses (after subsequent builds).
The service id is srv-brua640951caka6i3sd0
First build:
Jan 7 09:45:02 AM CACHE SAVE node_modules, rsyncing...
Subsequent build:
Jan 7 09:51:54 AM ==> Downloaded 624MB in 10s. Extraction took 22s.
...
Jan 7 09:52:10 AM RENDER CACHE MISS node_modules
Hey @karlhorky-upleveled, That script looks like it should work as expected. We cache the node_modules directory by default so it can be removed from the script. Caching the public directory is non-standard, which is why it has to be moved to $XDG_CACHE_HOME explicitly.
I tried with a pared-down example:
#!/usr/bin/env bash
# Ref: https://community.render.com/t/gatsby-build-caching-and-image-transformations/129/2
restore_render_cache() {
local source_cache_dir="$1"
if [[ -d "$XDG_CACHE_HOME/$source_cache_dir" ]]; then
echo "CACHE HIT $source_cache_dir, rsyncing..."
rsync -a "$XDG_CACHE_HOME/$source_cache_dir/" $source_cache_dir
else
echo "CACHE MISS $source_cache_dir"
echo "Creating empty dir"
mkdir $source_cache_dir
fi
}
save_render_cache() {
local source_cache_dir="$1"
echo "CACHE SAVE $source_cache_dir, rsyncing..."
mkdir -p "$XDG_CACHE_HOME/$source_cache_dir"
rsync -a $source_cache_dir/ "$XDG_CACHE_HOME/$source_cache_dir"
}
install_and_build_with_cache() {
restore_render_cache ".cache"
restore_render_cache "public"
echo ".cache contents"
cat .cache/*
echo "public contents"
cat public/*
echo "Writing files"
echo $(date) >> .cache/somefile
echo $(date) >> public/log.txt
save_render_cache ".cache"
save_render_cache "public"
}
install_and_build_with_cache
After a few deploys I see:
Jan 7 03:13:26 PM ==> Running build command './cache.sh'...
Jan 7 03:13:26 PM CACHE HIT .cache, rsyncing...
Jan 7 03:13:26 PM CACHE HIT public, rsyncing...
Jan 7 03:13:26 PM .cache contents
Jan 7 03:13:26 PM Fri Jan 7 21:05:14 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:06:35 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:08:39 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:11:25 UTC 2022
Jan 7 03:13:26 PM public contents
Jan 7 03:13:26 PM <!DOCTYPE html>
Jan 7 03:13:26 PM <html>
Jan 7 03:13:26 PM <body>
Jan 7 03:13:26 PM <h1>This is just a test</h1>
Jan 7 03:13:26 PM </body>
Jan 7 03:13:26 PM </html>
Jan 7 03:13:26 PM Fri Jan 7 21:11:25 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:05:14 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:06:35 UTC 2022
Jan 7 03:13:26 PM Fri Jan 7 21:08:39 UTC 2022
Jan 7 03:13:26 PM Writing files
Jan 7 03:13:26 PM CACHE SAVE .cache, rsyncing...
Jan 7 03:13:26 PM CACHE SAVE public, rsyncing...
Jan 7 03:13:41 PM ==> Uploading build...
Jan 7 03:13:43 PM ==> Your site is live 🎉
So it looks like the script functionality works as expected. One thing to not is that if the build is canceled or fails it might not get to the point where it uploads the cache for later reuse.
How can I verify this? If I open a shell, I can’t see node_modules in the $XDG_CACHE_HOME location. Or is it only there during build and then removed from the final environment where the shell is started?
Render’s infrastructure around uploading and deploying is still somewhat of a mystery to me…
Another detail: I have recursive node_modules in my version (eg. I cache all packages/*/node_modules directories, since I’m using Yarn Workspaces). See the script below. I didn’t post it in the original script since it probably isn’t useful to everyone.
Show script
#!/usr/bin/env bash
# Exit if any command exits with a non-zero exit code
set -e
# Caching code courtesy of https://community.render.com/t/gatsby-build-caching-and-image-transformations/129/2
restore_render_cache() {
local source_cache_dir="$1"
if [[ -d "$XDG_CACHE_HOME/$source_cache_dir" ]]; then
echo "CACHE HIT $source_cache_dir, rsyncing..."
rsync -a "$XDG_CACHE_HOME/$source_cache_dir/" $source_cache_dir
else
echo "CACHE MISS $source_cache_dir"
fi
}
export -f restore_render_cache
save_render_cache() {
local source_cache_dir="$1"
echo "CACHE SAVE $source_cache_dir, rsyncing..."
mkdir -p "$XDG_CACHE_HOME/$source_cache_dir"
rsync -a $source_cache_dir/ "$XDG_CACHE_HOME/$source_cache_dir"
}
export -f save_render_cache
install_and_build_with_cache() {
restore_render_cache "node_modules"
# All node_modules dirs at all depths, except for in Yarn cache dir
find "$XDG_CACHE_HOME" -type d \( -name 'node_modules' -and -not -path "$(yarn cache dir)*" \) -prune -exec /bin/bash -c '
for path in "$@" ; do
restore_render_cache "${path/$XDG_CACHE_HOME\//}"
done
' {} +
yarn --frozen-lockfile --production
save_render_cache "node_modules"
# All node_modules dirs at all depths
find . -name 'node_modules' -type d -prune -exec /bin/bash -c '
for path in "$@" ; do
save_render_cache "${path/.\//}"
done
' {} +
restore_render_cache "packages/website/.cache"
restore_render_cache "packages/website/public"
yarn gatsby build
save_render_cache "packages/website/.cache"
save_render_cache "packages/website/public"
}
install_and_build_with_cache
I’ll try removing just the node_modules in the root, because I assume Render doesn’t cache all the recursive node_modules directories?
I’m assuming this because when I run yarn install --frozen-lockfile --production it still takes about 140 seconds:
# yarn install starts here
Jan 8 03:10:58 PM [1/5] Validating package.json...
Jan 8 03:10:58 PM [2/5] Resolving packages...
...
Jan 8 03:10:59 PM [3/5] Fetching packages...
...
Jan 8 03:11:01 PM [4/5] Linking dependencies...
...
Jan 8 03:13:23 PM [5/5] Building fresh packages...
# yarn install done here
With proper caching of all node_modules folders (also recursively) I’ve seen yarn install take only 0-2 seconds.
Maybe the above error is actually a bug in Render’s deploy scripts: try rsyncing node_modules to $XDG_CACHE_HOME and watch all Render deploys fail.
However, removing rsync for the root node_modules didn’t help with the cache misses:
Jan 8 12:39:45 PM CACHE MISS packages/website/.cache
Jan 8 12:39:45 PM CACHE MISS packages/website/public
It seems like there’s nothing in the cache actually…
Jan 8 12:35:00 PM ==> Downloading cache...
Jan 8 12:35:07 PM ==> Detected Node version 16.2.0
@Ralph is it a problem if I’m rsyncing deep directories recursively like this? Or is it an issue that the paths in $XDG_CACHE_HOME partially match the paths on disk? (for example, packages/website is a folder in the Git repo, and $XDG_CACHE_HOME/packages/website/node_modules are the node modules cached for that Yarn workspace)
One potential solution would be for the save_render_cache function to change the slashes in the paths to some other separator (eg. changing / to - or something), to avoid these problems. And then for restore_render_cache to convert back to slashes. But I would only go ahead with such a solution if you can confirm that these scenarios I described above are problematic.
Edit: Ah interesting, it seems that if I remove all the references to the recursive node_modules, then after 2 or 3 deploys the Gatsby cache starts hitting instead of missing (see below). I’m not sure why it’s not instant though - do you have any clue here? Maybe Render cache takes a while to warm up / be ready?
Jan 8 03:13:34 PM CACHE HIT packages/website/.cache, rsyncing...
Jan 8 03:13:45 PM CACHE HIT packages/website/public, rsyncing...
@Ralph this seems like a common problem with using $XDG_CACHE_HOME on Render’s infrastructure - just ran into it again now.
First 4 deploys were missing the folder in the $XDG_CACHE_HOME, although I rsynced the folder there after the build. Then on the 5th or so deploy, it appeared.
Since the “Render build cache” seems to be pretty undocumented, could you expand on how it works here and potentially help out with a solution for this problem?
Second question about the build cache: can I sync things there after the initial build as well? Eg. once the server is running, can I rsync new data there? Or will it disappear? It’s really hard to debug this because of the lack of documentation and the weird inconsistent behavior described above.
Lastly, during deployment, is there any other way to get access to the files that are in the previous deployment? (the files in the running instance)