Ready to publish 1,212 G-H Pages blog posts but concerned

In my ~/website/_posts directory I have 1,212 blog posts converted from 2,500 Stack Exchange posts. Only the good posts made the cut.

In my ~/website2/_posts directory I have 4 blog posts “pulled” from my current GitHub Pages Pippim website repository.

Reading these instructions it is my understanding I need to write a bash script to compare the two directories and issue a command for each new file:

cd ~/website2
cp -a ~/website/_posts/filename.md  ./_posts/
git add ./_posts/filename.md
git status
  • Are the instructions correct?
  • Is there an existing bash script somewhere that already does this?
  • Is there an easier way of pushing an entire directory to GitHub Pages?
  • Note: The first command: cd and last command: git status will not be part of loop iterating over 1,212 blog post files.

I’m a little concerned with how long it will take GitHub Pages to rebuild with twelve hundred blog posts. Currently it’s about 20 seconds. Will this take a lot longer and mean I’ll have to learn how to develop the website locally?

Obviously after populating the blog post landing page with 1,212 entries I’ll be doing a lot of development with lookup by date, tag, Stack Exchange site (perhaps a future way to use category), etc. I wouldn’t want GitHub Pages commits to go over 30 seconds or so whilst developing on-line. If it does perhaps I should consider developing pages locally?

As a side note a couple of posters recently asked about changing the date or directory name in all posts. It took 1,182 lines of Python code to convert Stack Exchange posts to Jekyll Blog Posts. Documenting the program seem to take just as long as can be found on my website’s Home Page ← link valid as of December 5, 2021 but may be incorrect page position in the future as the Home Page expands.

Correctness of approach

What you’ve got set up there is correct.

This is a rather unusual usecase of moving across repos. And it’s a handful of lines to handle a single file. So it’s best to do it yourself like you have, after looking at instructions. There is no built-in tool or some program that does this for you as far as I know, if that’s what you are looking for.

Bulk copy

It’s not clear but I am guessing you want your solution to scale for all 1212 posts to copy from one directory to the other?

You should just copy the entire posts directory then. I’m leaving out the -a flag as I’ve never needed it before and can’t tell what you need it for, if it all.

$ cp ~/website/_posts/* ~/website2/_posts/

Not need to use a for loop because globstar * will match on every file (or directory) inside the source path.

So that is equivalent to:

$ cp ~/website/_posts/abc.md ~/website2/_posts/abc.md
$ cp ~/website/_posts/def.md ~/website2/_posts/def.md
$ # ...

I’d recommend a single commit. Unless you want 1212 commits,

$ cd ~/website2
$ git add .   # use a dot for top-level or `git add _posts` for more targeted
$ git status
$ git commit

This should also answer “Is there an easier way of pushing an entire directory to GitHub Pages?” As you commit everything added to a directory and then you can push that to GitHub and it will be used in GH Pages.

Performance

Yes you will have performance issues likely. Jekyll and Ruby are not known for speed. For a site of say 50 pages, this is okay, but after that the build times get slow. I have a few GH Pages sites with about 1000 pages and they take 3 minutes or so to build locally (and therefore a similar time on GH Pages).

So a bit of advice here is to use Jekyll 4 instead of Jekyll 3. Jekyll 4 uses cache and maybe other optimization, so this has reduced my build times to be about 3x faster (3 min to 1 min), for each big project I did it for.

You will need GH Actions + GH Pages, or Netlify, or similar, if you want to use Jekyll 4 instead of Jekyll 3. There’s a Deploy section in Jekyll docs to help there.

For the long term, you can consider using an alternative such as Hugo which is supposed to be the faster static site builder https://hugo.io

1 Like

The -a option on cp is just a habit. It keeps the file’s same owner, group and public permissions and the same timestamps for Created, Modified and Last Access.

The git add . option looks great! So much easier than looping through every filename and issuing a separate command. Because I don’t want to commit the other directories (likely more up-to-date on the reop and I’d be wiping out those changes) should I be using git add _posts/. instead? (No space between / and .)

Any word when G-H Pages will be moving off of Jekyll 3.9 to version 4.x? It sounds like many of us can benefit from speed improvements.

I’ve heard of “Hugo” and that it being written in compiled Go language was 100 times faster, or something like that, than Ruby which is interpreted language. Any rumours about G-H Pages converting to Hugo and abandoning Jekyll?

Based on your answer I’ve decided to keep developing the website with only a handful of blog posts. What I might do is setup two websites, one production with 1200 posts that is only updated periodically and a development website that I can keep committing a dozen or 50 times a day to. So far 1,071 commits in about six weeks. I could set it up such that nightly the development website change the production website except the _posts directory.

I’m not keen on developing locally at this time. But have started casually looking into what PyCharm hooks are available for GitHub. As well I’ve heard MS VS code is popular with many developers and MS has GitHub hooks in their product too.

You’ve written to me before with GitHub Actions and cron which I’ve yet to explore. So much to do, so little time :slight_smile:

Plus I still have to find the guy who implied that converting the Stack Exchange website to a GitHub Pages website would be easy-peasy. It’s hasn’t been that easy!

Sincere thank you for your well thought out and written answer!

1 Like

No

The slash and the dot both add no value here. In fact the dot is really uncommon at the end.

Just do

git add _posts

I like to add slash at the end for readability and if you use a tab in the terminal to complete the path then you get a slash.

git add _posts/

There is an open issue on the GH Pages move to Jekyll 4. No date set but not soon. Years away maybe.

I heard nothing about Hugo for GH Pages.


But you can use Jekyll 4 with GH Pages if you use GH Actions.

Or Netlify, which is much easier to configure.

And you can use Hugo too. I don’t have a CI CD to deploy, but here is a template for a site.

1 Like

Another option to reduce build time is to make two websites on two repos. Based use a common theme so they are styled the same.

Then one repo serves on the root and rebuilds quickly with a few pages. Repo name as abc.github.io

https://abc.github.io
# _config.yml
baseurl: ''

And one site serves on the subpath. The repo’s actually name would be blog. This would be slower to build with all the pages.

https://abc.github.io/blog/
# _config.yml
baseurl: '/blog'

Sorry I don’t understand. My website for the general public (production) is called: pippim.github.io. I thought my secret website would be pippim.github.io/dev or pippim.github.io/test.

Then I thought each night (actually 4 am), a cron job would compare the production version to the development or test version to see if changes were made the day before. If so it would push all the files except ../_posts into pippim.github.io and launch an action to rebuild the site?

I am suggesting instead of one site, which uses a private and public repo

Rather make two sites, both public repos

And tie them together using a navbar so it is seamless

the root content site will be built when you change one of say 10 pages

And the blog site of external content runs using a cron job to pull in content.

Most of my commits come from changing index.md, _layouts/, assets/css/style.scss and assets/js/ directory. It varies from about 10 to 50 commits a day.

Today (December 12, 2021) there are now 1,218 blog posts (no longer 1,212) because some questions or answers have been up-voted in Stack Exchange and gone above the 2 up-vote threshold for converting Stack Exchange to GitHub Pages.

My current line of thought is a bash script that forks pippim and deletes all but a handful of blog posts.

Then I can continue development on the fork with the usual 15 second commit time because there are only a handful of blog posts to render. This same bash script could be used by others who want to fork the Pippim website and turn into their own website.

When ready I can run another bash script that restores the deleted blog posts and pushes the fork back into origin/main.

I hope that makes sense and more so I hope that is “doable”?

NOTE: Some testing would be best done on the main website’s Blog Post landing page for things like Tag Accordions, Title Accordions and Date Accordions. The landing page is currently in answers.md and doesn’t do anything special, it simply lists a half-dozen blog posts in chronological descending order.

Check out git submodules.

A fork can get messy if you want to have them diverge with different content or subset of content.

Make one repo which has all the posts. No other content.

Then a second repo which has the markdown, JS, CSS, a few pages. And also references the first repo as a submodule.

What this does is effectively clones one repo inside the other when you are working locally but without making the git commit history complex. The submodule will look like a folder, with whatever name you want e.g. _posts. And it will reference a specific commit in that posts data repo.

That is a neat way to structure things. And makes your data easily available for reuse by others.

It does take research and practice to get family with submodules.


Also, having a submodule won’t make the build time fast. Jekyll will still see all the content inside and outside the submodule.

And you’ll still have to pull the posts repo in manually to update the content.


And by the way, you could store posts ad 100% YAML content with no HTML. And also set a default layout in the config file so all posts have post layout.

To get fast build times, what you could do is delete the contents of the _posts directory except maybe leave on or two posts.

And leave that deletion uncommitted.

So locally you have a fast building site with little content and remotely you have a slow but complete build.

And this deletion works whether you use plain repo structure or use submodules

Oh one other idea. You could add _posts to config so they don’t get built

# _config.yml

exclude:
 - _posts 
  - vendor/ # exclude gems from build output 
  - Gemfile # Gemfile.lock is implied because of how matching works.
  - LICENSE
  - README.md

Just remember to add all the other stuff above that it built into Jekyll. If you only have one item in your excludes list, the site will build wrong.

See in use


You can have file be _config_local.yml

Then only use it locally like this. Loading two configs.

jekyll serve --config=config.yml,config_local.yml

And use the standard serve or build command to use only the main config.

A lot to think about in your two answers today. At my slow learning rate; I’ll get back in a couple weeks / months. Quick point: The blog posts have extensive HTML navigation bars, table of contents, image styling, More/Less button cookie and copy code block to system clipboard. Therefore I’m not sure if storing the blog posts in YAML format instead of .MD format will work…

Another point is on my development/testing repo I want ALL my half-dozen posts to be rebuilt with each commit. This is because I want to see how the latest coding changes are rendered in HTML.

Oh BTW my _config.yml file has nothing fancy and only contains:

theme: jekyll-theme-cayman
title: Pippim - Free Software
description: Free Open-Source Software for the World. Free of Ads Too!
1 Like

Hi Michael,

I think I was a bit of a tempest in a teapot or making much ado over nothing.

I created an HTML file with 3,633 tags cross-referencing 1,172 blog posts (they weren’t actually populated in _posts/ yet though) and build time went from 30 seconds to 4 minutes.

I was terrified to put all the _posts/ into GitHub Pages after that speed penalty. However I pulled the trigger and much to my surprise build time is still only four minutes or even less sometimes.

I actually don’t see a need for a development environment anymore.

Here’s what the tags looks like:

Pippim post tags expand must scroll

If anyone would like to try them out here’s the Pippim Website’s Answers link

Thanks for all your help Michael!

1 Like