Jekyll not processing files

rcrooks · September 16, 2021, 5:39pm

We run 15 Jekyll-based web sites. I recently got all the source pages translated into several languages (these go on separate localized sites). The files I got back are fine as far as the code goes, but Jekyll does not process the pages, just copies them as is to _site.

It has to be something in the file properties, because I can simply copy/paste the contents of any page into a new file, and Jekyll processes them fine (I’m doing all this in Visual Studio Code).

I can’t see anything in the file properties that looks problematic. Has anyone seen this, or do you have any ideas on how to identify/fix the problem?

Many thanks,
Robert

rdyar · September 16, 2021, 6:58pm

do they have front matter? the 2 sets of — at the top. Without that jekyll will just copy them.

rcrooks · September 16, 2021, 7:17pm

Yes, they have that. No extra lines added at the top or anything like that. I can just copy and paste the contents into a new file, and it gets processed without a problem.

rdyar · September 16, 2021, 11:12pm

are you saying you send the files out to a 3rd party to convert? and the files that came back have this issue?

There used to be an issue with UTF-8 where jekyll was super picky about that and would do something wrong if the file was not UTF-8 but I don’t remember what that was. Could explain what you are seeing.

Are there any errors if you run jekyll build --verbose

rdyar · September 16, 2021, 11:16pm

what version of jekyll are you using? I think the utf issues were fixed in 3.6.

rcrooks · September 17, 2021, 12:07am

Correct: files were sent to a 3rd party to convert. (I’m working with them also to try to diagnose) . We’re using Jekyll 4.2. I’ll try --verbose to see if anything else comes through.

thanks!

MichaelCurrin · September 17, 2021, 6:05am

Make sure to set a layout.

---
title: My title
layout: page 
---

## My body

Or in your config set a default so all pages have page layout. link

And then you need this to exist

_layouts/page.html

rcrooks · September 17, 2021, 1:02pm

Yes, pages have a title, and default layout is set in config.yml

BillRaymond · September 17, 2021, 4:02pm

If it is as @rdyar suggests, I did a quick Google search and found lots of command line tools to batch convert to UTF-8. Maybe try one of those methods just to make sure it’s not that? I’ve had issues with documents that look perfectly fine and then not work after the fact (for ebooks, but still similar challenges).

rcrooks · September 17, 2021, 4:40pm

thanks, Bill. I’ll give that a try. I did find out why this is happening, I think, but not sure what the fix is. I compared a working and non-working version of the same file in a hex editor, and found that in the non-working file there are three hidden characters before the initial — (screenshot - that’s the only difference I can see).

Maybe converting to utf-8 will fix that…

BillRaymond · September 17, 2021, 5:06pm

I love the investigatory work, but it just looks like The Matrix to me

I’m sure you are right though that the format is slightly different

rcrooks · September 17, 2021, 7:59pm

Yes, I’m sure now that’s the issue, and I’ve identified the encoding - time for the batch conversion to UTF-8

rcrooks · September 18, 2021, 6:56pm

So, have tried several shell scripts I found online, but after much tinkering with them, can’t make one work.

Anyone know of a good way to convert the encoding of all HTML files in a directory structure? (Mac OS)

BillRaymond · September 21, 2021, 7:30pm

Okay, I played around with this, and it is way over my head, but I think I got you started :-). This assumes you are using a Mac and or have BASH or ZSH installed on your Windows or Linux machine.

Create a temporary folder and copy your html files over to it (I do not want to be responsible if something goes wrong).

Open a terminal window (zsh or bash will run) and type the following (rename filename.html to some random file you created:
file -I filename.html

You will see the output. I bet it says ASCII, but if it does not, then make a note of the file type.

If it is NOT ASCII, type the following command:
iconv --list or iconv -l

Pick the file format that matches what you want to convert from. Note that the file types listed might have spaces next to them. For example, it might see HP-ROMAN8 R8 ROMAN8 CSHPROMAN8. Just pick one of those as they are all the same, meaning I believe R8 is the same thing as ROMAN8.

Using a code edit, create a new file in that new folder and call it convert.sh with the following code:

# convert file to UTF-8
for file in *.html; do
    echo "Working on file: $file"
    newfile="${file%.html}.utf8.html"
    iconv -f "ASCII" -t "UTF-8" "$file" > "$newfile"
    echo "Output to new file: $newfile"
    file -I $newfile
done

As mentioned, if your input file type does not just say “ASCII”, then replace it with the closest match file from the iconv list you got earlier.

Also, if your files have a different extension, like [filename].md or [filename].markdown, then replace any instance of html with md or markdown.

While in the terminal window, go to the folder where your copied files are located.

For example, if you are on macOS and the files are on your desktop in a convert folder, you type this:
cd desktop/convert

Type ls to make sure you see all the converted files.

Now you are going to run the convert.sh file you created. Try one and if the other throws an error, try the other:

zsh convert.sh

-or-

bash convert.sh

You should see the code process all the files, and now you will have a whole bunch of new .utf8.html files (or .md or .markdown or whatever).

In your Jekyll site, temporarily remove all those converted files and put them somewhere else. Copy over the newly converted utf8 files and see if Jekyll builds.

I hope this helps. This was something new to learn, so hopefully, I did not send you down the wrong path.

rcrooks · September 21, 2021, 8:25pm

wow - thank you for all that effort!

As it happened, I didn’t actually need to convert the files - I just needed to get rid of the BOM. I chatted with one of our engineers about it this morning, and he came up with a one-line solution:

find /Users/rcrooks/translations -type f -name '*.html' -exec perl -e 's/\xef\xbb\xbf//;' -pi {} \;

BillRaymond · September 21, 2021, 9:35pm

Great! Glad you sorted, and thank you for posting the solution

Topic		Replies	Views
Identical file names are causing problems Help	2	547	May 27, 2020
Jekyll Build not working Help	7	4667	December 3, 2018
Posts not getting processed at all Help	7	2133	September 30, 2020
Working with Collection in Jekyll application Help	2	780	October 14, 2018
Modify the source pages folder	1	1107	July 1, 2017

Jekyll not processing files

Related topics