Generating _site with consistent modification time to avoid re-deploying unchanged content

Hello,

I have a couple of static websites that I generate with Jekyll and upload to a GCS bucket.

Today, for deployment, I have a shell script that does some validation, parses a bucket name out of _config.yml, and essentially runs:

JEKYLL_ENV=production jekyll build --incremental
gsutil -m rsync -d -r ./_site gs://$GCS_BUCKET

This works great, but an interesting question recently sprung to mind:

What happens if I have a large site, and I want to make a change to it from a different computer, or if my ‘_site’ directory gets nuked for any number of reasons?

In the happy case (_site just got nuked), files that get straight-up copied (e.g. images) will retain their mtimes, but files that get generated by Jekyll (e.g. posts) will be new files with new mtimes. As a result, even if nothing changed, or if I just changed a single post, it would want me to re-upload all generated html.

In the unhappy case (separate machine, have to git clone my repo from scratch), pretty much every file will have a new mtime, so the entire site has to be re-uploaded, even if it hasn’t changed.

Is this a problem someone has solved, or are people just living with it because it’s not that big of a deal in practice?

I suppose the git clone thing is kind of out of scope of Jekyll, but it would be neat if there were a plugin/setting where all generated content has its mtime set to the latest mtime of its dependencies as part of the build process.

cheers!

I think Jekyll Build nukes the _site folder each time it is run - though I suppose not if you are using incremental - which I think is not 100% reliable just yet.

I use AWS S3 and the S3_website gem to deploy, I believe it it it is checking the hash (?) of each file to determine which ones need to be uploaded. So wiping out the site and regenerating doesn’t matter, only the files needed are uploaded.

This is a solved problem if you use rsync with the correct settings.

For my Jekyll site I use Travis CI to build and rsync files to my webserver. On every Git commit Travis clones my repo, builds the site fresh with no cached _site folder, and then only rsync over new or updated files.

2 Likes

Ah, that’s right, I took a closer look at the gsutil manual, and it looks like there’s a ‘-c’ option to force it to use checksum if the file sizes match. That’ll handle correctness, thanks!

my clients hosting company has rsync installed, I have passwordless ssh access but they claim to use rsync I need the dedicated server package … when I try to upload with rsync the command times out - same command works on dreamhost which is also shared hosting. Can I get rsync to work or is there a similar ‘upload the whole _site folder contents in one go’ method?

solved - I have to make a folder for the site - it was directly in / - now rsync works fine.