Change a huge GitHub Pages repo to use collections and linking

Hi fellow Jekyllers!

I have a huge GitHub page repo where all documents are markdown-files. The site is currently set up without collections, but I would like to enable this to get pretty links.

Currently all the documents are generated into the base folder “_site”, and the links are “docs.something.com/page.html”.
What I would like is to have the links turn out to be “docs.something.com/docs/collectionName/page”.

I have managed to achieve this by using collections and defining each of them, however, hundreds of markdown files are internal linking to the base folder ([some_name](xxxxx.htm l)), I see that it’s a huge job to change all links manually.

Are there any way to avoid changing all internal links in the markdown-files, and automagically let jekyll find the file? Or is there another best practise way to achieve this?

Thanks!

1 Like

Hi Gotnoname,

I don’t have an answer but an interest as I’m in the process of creating 1200 blog posts into the _posts directory. They are auto created from an SEDE Query run against the current Stack Exchange Data Dump.

I wasn’t aware of these links you mention and I work purely on-line so far with no local storage. But currently I use Python to convert from Stack Exchange markdown to Github Kramdown / GFM so I would use Python in a situation like you mention (which I don’t fully understand the problem).

Good luck!

Before going on, let me clarify what I believe you are asking, which is that you have a page called /docs/sometopic/thedoc.markdown, and in that file, you might have a link that looks like this:

<a href='/reference.html'>Check out this reference file</a>

And you want to change the URL to look like this:
<a href='/docs/anothertopic/reference.html'>Check out this reference file</a>

Is that correct?

If so, Jekyll does not really offer an option to change a URL for you essentially. That said, @MichaelCurrin solved a similar problem (bulk changing dates in a lot of files) by using a REGEX expression in Visual Studio Code. Here is the article:

I suspect you can do something similar for your requirement, which is to set up a one-time bulk search-and-replace that uses REGEX to search for, say, /reference.html to /docs/anothertopic/reference.html.

Unfortunately, I do not have the REGEX experience to provide you with all the details here, but it seems like a decent solution. Hopefully, a filename or YAML tag or something makes it easy to know what folder to point the updated href to.

1 Like

I’d need to see a couple more scenarios to fully understand the intention and then I could advise on regex replace in VS Code.

Additionally, you can scope a search or a find and replace in VS Code to a directory. And exclude others.

So you might want to write a replacement pattern and run it against a directory, and repeat that for say each collection.

Or maybe one find and replace across the whole codebase will work


I’d also recommend setting up a link Jekyll tag to enforce the paths are correct at build time. At least after the update, but even better if you can do that before the update so you can validate all the old paths first.

e.g.

[My page]({% link abc.md %})

[Collection page]({% link _my_collection/def.md %})

@MichaelCurrin @BillRaymond thanks for the replies! More information about the setup:

The current setup I have now is the following:

Folder structure:
_docs: which is the base folder for everything
_docs/subfolder: which is a folder within the _docs-folder

I have several subfolder under _docs, but this is the basic setup I have.

In the _config.yml I have defined these collections:

collections:
  docs:
    output: true
    subfolder:
      output: true
      permalink: /docs/:name/
    subfolder2:
      output: true
      permalink: /docs/:name/
      folderInSubfolder2:
        output: true
        permalink: /docs/folderInSubfolder2/:name/

All documents are in the markdown format, and the links are defined with

[Some doc name](markdownFileName.md)

. I’m trying to achieve a way to replace all existing links in the markdown files so that they work the collections defined, without doing all of it manually.

Hope this clears things up a bit!

Thank you for the answers this far!

I understand the setup of the collections.

I didn’t realize you could nest them. Have you verified the subfolder approach on a small scale?


Onto the replacement

Well the bit we need is…

How do you know which collection this one is in?

http://docs.something.com/page.html

To get it to be:

http://docs.something.com/docs/collectionName/page

If you have a link like this:

[Page](page.html)

Then that could be anywhere on the site. So I can’t see how to replace as /docs/collectionName/page/.

BTW if you can use link tag, you would do []({% _collectionName/page.md %}) and then the ...html path would be worked out for you. In fact if you change permalink on the page or in the config then the html path will reflect that.


Also if everything is in docs, is there anything in the root as /index.html? maybe you should remove that layer in the path and put everything in the root. It will be simpler to configure and structure. Especially if docs is on your domain name already.

You could even make a separate site for just docs.

Maybe you should just to a find and replace for each page.

E.g. for the first page.html

Do a global find and replace of page.html with {% link _collectionName/page.md %}.

And repeat for page2.html

Do a global find and replace of page2.html with {% link _anotherCollectionName/page2.md %}.

And repeat for all pages.

You won’t need any regex pattern. You just have to use your own brain to know that page.html belongs in the first collection and page2.html belongs in the 2nd collection (or whatever sub sub directory).

@MichaelCurrin Ah I see. Well I had trouble finding enough documentation and understanding it make this work correctly. But I was thinking of collections as “folders”, so each subfolder under _docs would be a separate collection, then a subfolder would be it’s own collection. This is order to get permalinks to follow the format of "https:/docs.something.com/docs/collection/subcollection/page.

But I could maybe then use a search/replace based on which folder I am in?

1 Like

Would you consider dropping use of collections altogether?

Just structure like this

docs/
  abc/
     def/
       foo.md
       bar.md  
       index.md
    index.md
  xyz/
    fizz.md
    buzz.md
    index.md
  index.md
index.md

No configuration needed for permalink or collections.

And then bar.md will be under

/docs/abc/def/bar.html

No need to set permalink

Well I like to use config values as permalink: pretty but up to you.