Long time between watcher starting and server running

Hi everyone!
I am new to Jekyll and started rebuilding our work group’s old website using version 3.8.5 (according to the lock file). This includes a lot of pdfs, videos and other static content, some of which is lecture slides with interactive material.
This content sits in a directory called “downloads” next to the source directory since the static files are not necessary for site generation. Inside _site I have a symbolic link to “downloads” which I also keep and exclude using “keep_files” and “exclude” in my _config file. There is no “downloads” in the directory that is being built into the target.

If I remove the symbolic link the page builds in around 9 to 12 seconds and starts the server immediately afterwards. Making changes takes around 7 seconds until they are built in. Once I put in the link (or just the whole “downloads” folder containing 6GB of small files) the build time stays around the same, but between Jekyll displaying build time and the server actually running around 8 minutes pass. My guess here is that the files in “downloads” in _site are still being watched which is a thing I entirely do not want.
If I then make a change in a single file Jekyll needs around 4 minutes more to actually tell me it made changes and finish the build. After this first hickup everything behaves quick and as usual. My second guess there is that Jekyll needs to construct some index structure once for the first update which lists all the files that are being watched in _site.
Both of these things are guesses, I have no idea how Jekyll actually works.

All of this won’t be a problem for actual deployment. I plan to let a hook start rsyncing “downloads” into the _site directory while I deploy the whole thing to the actual web server directory. It is however very cumbersome for testing and previewing the site offline.

There are other questions already asking how to avoid excluded files and directories being watched. As far as I could tell these did not get to a conclusion but instead found a workaround. The core problem was a bit different, too.
My current workaround is using a python SimpleHTTPServer to view the site and rebuilding manually every time I make changes. I’d like to avoid this on behalf of other people working on the site later on.

If you have any guesses what causes the problem and how to avoid it I would gladly appreciate those!

Thanks for reading and in advance for helping!

Any entry in the exclude list defined in your config file is automatically ignored by the watcher. You could use simple filenames or file paths or entire directories.

As of v3.8.5, while excluding a custom file or directory, you need to list paths that are excluded by default (Gemfile, Gemfile.lock, node_modules/, vendor/, etc) in addition to your custom paths.
In v4.0, you need to just list your custom file paths. The paths excluded by default are always ignored internally.

The point here being, if you don’t want Jekyll to handle files in downloads/, just exclude the entire directory and delegate their handling to a Gulp task or a Python script or similar routes (all of which need to be implemented by yourselves).
Moreover, if you’re building / developing on Windows add gem 'wdm' to your Gemfile to improve the watcher’s response times.

Thanks for your reply!
downloads is already being ignored as a directory per exclude and not even located in the source directory. In the target directory merely a symbolic link to it is present. Using or not using this one link in _site makes all the difference in serve times.
I’m developing under Ubuntu 18.04. bionic by the way.

Ah yes! You had mentioned this clearly in the opening post of this thread.
I misread it.

This is a strange behavior and totally unexpected because the watcher only watches the source directory.
Will you be able set up a public test repository with the same file-folder-tree and config file as your production site but without the actual files in the downloads/…?
(so that I can clone it locally and experiment with it…)

I’m afraid I can’t give you the actual files I’m working on but I can set up a dummy page after work using the same structure with similar linking to a theoretical downloads directory. Would that be of any help to you?
A colleague mentioned it could be the Jekyll server managing URLs internally. To my knowledge Jekyll does things like setting the URL of files like people/doe-john.md to people/doe-john/ without an index.html file being present instead of setting it to people/doe-john.html. I guess this needs some internal indexing as well. Maybe this process is eating up a lot of time if many files are in _site?

A dummy site would be perfect!

Jekyll does manage URLs internally but doesn’t consider files inside _site for this. Will have to test if symlinks play a nefarious role here though…

Ok, got around to setting it up. Just clone

You’d be surprised how little it takes to reproduce that behaviour, at least on my end.
Your setup could look somewhat like this:
.
├── dummy_top
| ├── dummy_site.git
| | ├── _site
| | | ├── downloads->../../downloads
| | | └── index.md
| | ├── index.md
| | └── _config.yml
| └── downloads (6GB folder containing lots of small files)

If you don’t get very long server loading times please tell me. This setup has the problems on my computer. If you clone it, put another downloads next to it and don’t get problems it has something to do with my setup.

I was not able to reproduce this issue with your setup.
Due to technical constraints, I couldn’t test this on my local system. So booted up a Linux virtual machine with Ruby, Git and Jekyll installed.

I then cloned your dummy_site to trials/dummy_site and programmatically generated a trials/downloads folder with 10000 files each amounting to 95kb

Running du -h downloads produces:

899M    downloads

I then added a Gemfile to ensure that I’ll be building with just Jeyll 3.8.5

# frozen_string_literal: true

source "https://rubygems.org"
gem "jekyll", "3.8.5"

I then ran the server by running bundle exec jekyll serve
Observation: done in 0.016 seconds.

In a new terminal window to the VM, i made some changes to trials/dummy_site/index.md (added front matter so that it is not static, edited HTML markup).
Obsevation: ...done in 0.002979388 seconds.

Inference: Symlink inside _site to a 899MB directory outside <source> directory has no visible effect on Jekyll regeneration.

Thanks and sorry for your effort!
But sadly, build time is not the issue. My dummy builds in 0.088 seconds. The time between Jekyll reporting
Generating...
_____________done in 0.088 seconds.
and it telling me
Auto-regeneration: enabled for '/home/mvogelsang/jekyll/dummy'
__Server address: http://127.0.0.1:4000
_Server running... press ctrl-c to stop.
is what is taking all the time. You don’t even need to create data to see what I mean.
Instead of a symbolic “downloads” I just made a normal directory “downloads” and put symbolic links to some arbitrary 1.2GB large directory on my system I had lying around. The difference between having that link and not having it is 11 seconds between
done in 0.0xx seconds.
and
Auto-regeneration: enabled for '/home/mvogelsang/jekyll/dummy'

So with your setup, does it make a difference in time until you can access localhost:4000 between having and not having the link?

The reason I booted a Linux VM is because my local system is on Windows.
Either ways, I tried to reproduce this issue with both a symbolic downloads/ and a normal directory with symlinked file, both cases pointing to a 1.2GB sized directory.

In both cases, the time between done in 0.0xxx seconds. and
Auto-regeneration: enabled for 'xxxxxx' was negligible.

Therefore, until someone else is able to reproduce this, it looks like an issue specific to your system.
Sorry.

Too bad, but thanks for your help!
Hopefully it really is a problem specific to me and noone else will get the same issues.