Ran into something similar on my end recently and stepped through it the same way you’re describing. The part that tripped me up initially was how sensitive the defaults were to the environment — once I adjusted for that the rest fell into place. Curious what you landed on in terms of the testing loop, because iteration speed seemed to matter more than the specific choice of tool. Happy to trade notes if anyone has a cleaner workflow.
By “in terms of testing loop” you mean the testing of the Jekyll structure and Liquid syntax or do you mean the testing of my own code?
If the latter, as I’m using PowerShell, the unit tests were really easy to create using a module called Pester. Integration/regression tests were basically feeding in a known good site and visually checking if it builds correctly. Also, AI.