How to check for un-linked/dead files in _site

Over in the “Core feature wishlist” topic, @michaelbach mentioned wanting a way to detect dead or un-linked files in Jekyll generated site. I too wanted to check for dead files, and found a relatively simple way to do it for Jekyll sites with wget and diff (wget is available from most package managers: homebrew, apt, etc).

Run the following commands from a site’s source directory while bundle exec jeykll serve is running (i.e. in another terminal window):

wget -r -nv -nH -P /tmp/sitecopy
diff -r _site /tmp/sitecopy

Wget makes a recursive copy of all files reachable from the site’s root into a temp directory. Then diff recursively compares it to the generated _site directory. Here are the same commands with human-readable switches:

wget --recursive --no-verbose --no-host-directories \
diff --recursive _site /tmp/sitecopy

The resulting output (below) shows un-linked files. Not all unlinked files are “dead”. For example below, the custom 404 page is live (on Github Pages), and the CSS map is for debugging. These can be excluded from the diff (second run below). The last file remaining, assets/minima-social-icons.svg, is un-linked and unused/dead, so it could be removed from the site (via _config.yml exclude):

$ diff --recursive _site /tmp/sitecopy
Only in _site: 404.html
Only in _site/assets:
Only in _site/assets: minima-social-icons.svg

$ diff -r -x '404.html' -x '*.map' _site /tmp/sitecopy
Only in _site/assets: minima-social-icons.svg

The wget/diff can be integrated into a build system. For example, as a Makefile target.

1 Like