Replacing a slow include with a custom Ruby Tag

KhBh · May 21, 2021, 2:58pm

On most pages on my website, I have a footer which recommends similar items in the archive. Since I have very little text about these items, and lots of rich metadata, I decided to implement my own content similarity engine and, since I was using GitHub Pages, I did this entirely in Liquid.

Fast forward a few months, and the site now has over a thousand pieces of content. Since the similarity algo is O(n^2), this blew up and the other day my build times exceeded the GitHub Pages timeout of 10 mins / build.

In case it’s useful for anyone else, here is what I did to fix it:

I switched my site from the default GitHub Pages deploy to a custom GitHub workflow. A few gotchyas there. Look at the linked workflow file for what I eventually landed on. Not only does this overcome the 10 min build timeout, but it also (finally!) let me upgrade Jekyll and use (make) custom plugins:
This overcame the timeout problem, but the increased overhead actually increased my build times. I upgraded to Jekyll 4.2, and changed all my site.foos | where: "slug", bar | first with the more sensible site.foos | find: "slug", bar. This did make my build slightly faster, but I was honestly underwhelmed by the improvement, so then…
I rewrote the offending _include code in Ruby as a custom Tag _plugin. This was relatively straightforward, but came with a few gotchyas:
1. My original include called another include. In order to render a vanilla include from Ruby-land, I had to copy-paste (!) the IncludeTag’s load_cached_partial function into my Tag () to be able to load the include I wanted to render. (Perhaps that Jekyll method could be made public / static?)
2. Sometimes when you fetch a page object from context it comes back as a Document and sometimes it comes back as a DocumentDrop? Eventually, I figured out that the best way to get data off these things consistently and reliably was to call .to_liquid.to_h on any such object first, converting it to a plain old Ruby Hash.
3. Lastly, this hacking and debugging was quite slow at first, as I had to stop and restart the server each time to reload the Ruby file (unlike Liquid changes which can reload on the fly). Eventually I figured out a hack A custom tag can accept arbitrary text after its name. So, I passed that @text into an eval() in my render function, thus allowing me to run arbitrary code from the proper context without having to restart the server. Not sure if there is a nicer way to drop a debugger in, but this worked for me

At the end of the day, I managed to speed up this particular O(n^2) path by about 10x and cut my overall build time in about half: a result I’m quite happy with! If you have any thoughts or feedback, please comment below, and thanks for reading!

KhBh · December 6, 2022, 7:18am

Fast-forward a year and a half and my site has grown large enough that even this nativized O(n²) algorithm is too slow.

Thankfully, k-NN is a known algorithm, and by precomputing some binary space partitions you can narrow down the search space dramatically.

In my case, that meant simply keeping around N sets of all the posts with a given tag, and then merging those sets to find the posts with the highest tag overlap.

My build is now ~7x faster Happy Jekylling!

Topic		Replies	Views
Help us benchmark Jekyll Share	16	5355	March 25, 2021
Slow build time with jekyll-rtd-theme - numerous liquid include's	7	1185	July 15, 2021
How I reduced my Jekyll build time by 61% Share	14	4758	July 28, 2019
Build time reduction Help	3	821	May 25, 2018
Help with experiment Help	4	631	February 21, 2019

Replacing a slow include with a custom Ruby Tag

Related topics