On most pages on my website, I have a footer which recommends similar items in the archive. Since I have very little text about these items, and lots of rich metadata, I decided to implement my own content similarity engine and, since I was using GitHub Pages, I did this entirely in Liquid.
Fast forward a few months, and the site now has over a thousand pieces of content. Since the similarity algo is O(n^2), this blew up and the other day my build times exceeded the GitHub Pages timeout of 10 mins / build.
In case it’s useful for anyone else, here is what I did to fix it:
- I switched my site from the default GitHub Pages deploy to a custom GitHub workflow. A few gotchyas there. Look at the linked workflow file for what I eventually landed on. Not only does this overcome the 10 min build timeout, but it also (finally!) let me upgrade Jekyll and use (make) custom plugins:
- This overcame the timeout problem, but the increased overhead actually increased my build times. I upgraded to Jekyll 4.2, and changed all my
site.foos | where: "slug", bar | first
with the more sensiblesite.foos | find: "slug", bar
. This did make my build slightly faster, but I was honestly underwhelmed by the improvement, so then… - I rewrote the offending
_include
code in Ruby as a custom Tag_plugin
. This was relatively straightforward, but came with a few gotchyas:- My original include called another include. In order to render a vanilla include from Ruby-land, I had to copy-paste (!) the IncludeTag’s
load_cached_partial
function into my Tag () to be able to load the include I wanted to render. (Perhaps that Jekyll method could be made public / static?) - Sometimes when you fetch a page object from
context
it comes back as aDocument
and sometimes it comes back as aDocumentDrop
? Eventually, I figured out that the best way to get data off these things consistently and reliably was to call.to_liquid.to_h
on any such object first, converting it to a plain old Ruby Hash. - Lastly, this hacking and debugging was quite slow at first, as I had to stop and restart the server each time to reload the Ruby file (unlike Liquid changes which can reload on the fly). Eventually I figured out a hack A custom tag can accept arbitrary
text
after its name. So, I passed that@text
into aneval()
in my render function, thus allowing me to run arbitrary code from the proper context without having to restart the server. Not sure if there is a nicer way to drop a debugger in, but this worked for me
- My original include called another include. In order to render a vanilla include from Ruby-land, I had to copy-paste (!) the IncludeTag’s
At the end of the day, I managed to speed up this particular O(n^2) path by about 10x and cut my overall build time in about half: a result I’m quite happy with! If you have any thoughts or feedback, please comment below, and thanks for reading!