Build posts to a text file?

When creating a blog in Jekyll, I write it in markdown. Then, Jekyll converts that markdown to HTML.

I would like to add an extra step in that process that converts the markdown to text, stripping out the yml front matter and any markdown items. Alternatively, it could convert the HTML output to text, but the idea is I would have a .txt file that anyone could read with no markup contained within.

Looking online, I see lots of web services and paid APIs that can do such a conversion, but I am looking to add the converter such that any Jekyll build does the conversion for me. Alternatively, rather than doing the conversion during a Jekyll build, it would do the conversion using a GitHub Actions workflow. Either way, I would prefer this to be an open-source solution.

Has anyone had any experience with this? I suppose it would not be incredibly difficult to create a converter since there are plenty of straightforward ways to strip markup language from a file, but if something already exists, I would appreciate using that.

Unfortunately, my searches have not pulled anything up on this matter so I am searching for the wrong thing or there is an approach I have not thought of :slight_smile:

Thanks!

I think you can do this within Jekyll.

So you want to turn

## My heading

- A
- B

Into

My heading

A
B

Maybe keep structure of bullet points?

Anyway you can do this in a layout. Say your post layout.

---
layout: nil
---
{{ content | markdownify | strip_html }}

That would render markdown in the body as HTML and then strip the tags to make it plain text.

You might do something as extra step before stripping HTML that uses replace. For example to add an extra line where is single newline or bullet point. To add in a literal dash inside each li tag.

My approach above replaces all posts with plain text versions in .html files still.

You could use a Ruby plugin based on Generators in the docs, to give you two outputs for each one. One traditional post like foo.html and another as foo.txt with only text by using my logic above.

1 Like

You could go a step further and use JSON instead of text to build a kind of API.

/api/posts/2020/01/02/foo.json
{ "content": "Body of the post.\nSecond line."}

I’d say a JSON API would be powerful. You could build a backend and then use a React app on the same or different domain to serve your content. Other people could load content of a given post on their frontend.

If you are happy with JSON approach, you can save yourself a lot of trouble around making a plugin and dealing with paths of text files which only contain a bit of data. And figuring out how to paths all those paths available like in a site map.

You could out a single data.json file at the root of your repo. With frontmatter and liquid. You can iterate through all posts and pages in your repo and output them as items in the list. Or as a hash, using path as a key.

The content field for each item can use the conversion approach in my first comment.

The result will be larger than many small files, but still reasonable for even a 100 posts because it is next and not styling.

It will be convenient to build and then on the other side to consume that single file. Using a single known path.

You can also more easily use it as a search index. Or page index for a React app. Or Next.js or whatever which reads pages from JSON data.

I have a gist to get you going. I’d help with the code or a proof of concept if you like.

2 Likes

Thank you for the thoughts. I had been thinking if there is a post called “2021-01-01-my-blog.md”, there would be a new “2021-01-01-my-blog.txt”. However, your idea of doing a JSON output to a single file might be easier and more straightforward. Okay, I will look into this idea before I go too deep down the rabbit hole. Also, I like the idea of not having a lot of extra files.

Thanks for the idea!

To clarify,

.md file would build as .html
And a generator would make the .txt at the same path (or another path)

If you start with naming your post .txt with frontmatter then it should output as .txt (but only one file). It might also not be recognized as a post if Jekyll needs .md for posts. But pages should be fine with any 3extension.

If you use data.json with frontmatter, it will build as a file data.json as well, keeping the extension. I’ve tested this out before.

Glad to help. Hope one of my ideas works out

@BillRaymond I put together this gist for you after local testing.

I forgot about the limitation where Jekyll filters can’t render Liquid expressions. So you’re going to need to use a plugin if you want to evaluate {{ }} or {% %}.