Importing from Wordpress

jehoshua7 · June 7, 2021, 11:02pm

Have been reading a few posts about importing from Wordpress, an interesting one at Wordpress import + custom CSS + newbie = help!

There is also a WP plugin to export from WP → Jekyll - Jekyll Exporter – WordPress plugin | WordPress.org

Are there any preferred methods please ?

MichaelCurrin · June 8, 2021, 9:15pm

Try out the method that seems easiest and if it doesn’t give what you want then try the next and the next and then post back on what you found.

Have you read the Jekyll docs?

For Blog Migrations it points to Jekyll Jekyll importer which handles Wordpress (self hosted) and Wordpress.com

If you have issues separating your theme and content, I’d say then see you can use WordPress itself to export your content as CSV or whatever. Even an RSS feed of pages and posts. Then you can use that to build the pages of your site.

Then you’ll have to find someone who made a Jekyll theme of the exact Wordpress theme you made. Or you’ll have to find a close theme or build your own theme

rahil627 · January 11, 2022, 6:41pm

i know this is old but… i feel the same way as the OP: there are quite a few ways to migrate from Wordpress (several(?) wordpress plugins, the jekyll-importer ruby gem), but, i wonder: what are the differences?

it seems the official jekyll-importer command-line program takes it straight from the database, whereas Ben Balter’s jekyll-exporter wordpress plugin actually tries to get the final content, after it’s all processed through wordpress…

but i mean, front matter, tags, categories, meta stuff, permalinks, server stuff, etc. etc.? it would be nice to know…

This importer only converts your posts and creates YAML front-matter. It does not import any layouts, styling, or external files (images, CSS, etc.).

at least there’s that bit…

i’m probably the last person on earth to change from wordpress ;(

rahil627 · January 13, 2022, 11:03pm

at first, i thought the wordpress just pooped out a xml file… maybe i got that from elsewhere :? The next time i clicked tools -> jekyll export and it gives a .zip file. In the zip file there are _posts, _drafts, and a few other root level files… but what’s strange is that i don’t think i received any pages… i’m not complaining though!

the wordpress plugin does indeed grab the final html, which comes after processing the content through plugins (‘n themes?). I had a table of contents plugin, so it hard-copied table of contents on the top of many longer posts. If there’s too much html crap in the final output, it might be best to spend the time to use the jekyll-importer program to just get the original post’s content, which is likely much cleaner, there-by saving a lot of time…

welll, maybe. Trying to just get that program was no joke. If you know linux, then i guess you’re used to errros(??), but I had to spend one entire night trying to set it up on my vps, getting help from several stack-overflow pages to help me get by. So, choose a poison: installing ruby gems vs search ‘n replacing all of your posts,

so, in the end, i’d choose the jekyll-importer method for most posts! Then, for only a certain few plugin-heavy posts/pages, use the wordpress plugin.

an example front matter using the wordpress plugin:

—-
id: 3501
title: 'Organized Things I&#8217;ve Written'
date: 2014-01-19T07:17:27-05:00
author: Rahil
layout: page
guid: http://www.rahilpatel.com/blog/?p=3501
permalink: /valuable-things-ive-written
medium_post:
  - 'O:11:"Medium_Post":11:{s:16:"author_image_url";s:74:"https://cdn-images-1.medium.com/fit/c/200/200/1*dmbNkD5D-u45r44go_cf0g.png";s:10:"author_url";s:28:"https://medium.com/@rahil627";s:11:"byline_name";N;s:12:"byline_email";N;s:10:"cross_link";s:2:"no";s:2:"id";s:12:"122fbba35cf6";s:21:"follower_notification";s:3:"yes";s:7:"license";s:19:"all-rights-reserved";s:14:"publication_id";s:2:"-1";s:6:"status";s:6:"public";s:3:"url";s:70:"https://medium.com/@rahil627/organized-things-ive-written-122fbba35cf6";}'
categories:
  - Essays
  - Experience
  - Organization
  - Personal
  - Self-assessment
  - Thoughts
---

boom.

rahil627 · January 20, 2022, 12:54pm

okay okay, last update: i ended up going with a third solution! the very simple ‘n straightforward python exitwp.py script (if you can use a command line), also listed in the official jekyll import page for wordpress.

here are my concluding thoughts, and if i don’t move the file, you can see my complete diff test here. The various imported sites are in that folder too, to really understand all of the differences.

conclusion

the python script is nearly perfect, for both parsing the body content and front mattter, just have to be careful of anything the markdown parser ate up. It also provides a config file to easily edit options. And, the script is so simple, you can edit it yourself (probably just have to change some options for html2text and markdownify libs it uses…).
the wordpress plugin is also really really good. You’d just have to get rid of some extra front matter (medium plugin), and use a markdown converter for things like links. Only extra newlines are lost.
the jekyll-importer via xml was the worst, adding html tags, removing extra newlines, and preserving wayyy too much front-matter (seo plugins??); but maybe useful to keep just for the sake of preservation of data, as a kinda last backup. I’m not sure if it can be configured because jekyll’s docs are so sparse…

body diff

python script

converts html into text (via html2text) then
converts text into markdown (via markdownify?)
yet it preserves all original new lines (although, they will not display corectly in markdown)
can possibly eat a few tags that were in the original post (my <cite> is missing)
provides options in the config file, including a little spot to replace patterns
- exitwp/config.yaml at master · some-programs/exitwp · GitHub
xmllint changed nothing for me, but it says to use it

the wordpress-plugin

no html, but it preserves html that was originally in the post
newlines are srota prettified (like when you tell a text editor to clean / re-format your code), you know, consistently two newlines between everything
doesn’t convert text into markdown, so the links are still in html, which is a pretty good ideae for the sake of preservation, as things can get eaten up by markdown converters

jekyll-importer (via xml)

converts content into html (no matter what it originally was)
adds <p> and <h> everywhere (even if it wasn’t in the original post!!)
correctly converts single newlines into <br /> where needed, which displays nicely in markdwon
squashes all content, removing all of the vertical space

jekyll-importer via database

todo

front matter diff:

python exitwp

preserves whats essential and names them nicely, including the slug and complete link

php(?) wordpress plugin

preserves a good amount of data, just excludes the “meta” crap
preserves the original link name (baseurl/p=1343), which might be useful, in case of old links
removed the /blog in the permalink key (which is nice for me!)

ruby jekyll-importer

preserves the most data including: published, status, all of the “meta” stuff including seo plugins

last update? / drafts

damn! turns out those three ways extracted a different amount of drafts: 5, ~30(?), and ~80. (the posts count were equal though)

the python script only caught 5, and uses the published: false tag, but left it in the _posts folder. i had to use a grep ‘n mv script.

the wordpress plugin caught a lot more

jekyll’s own ruby program though caught the most, including revisions and auto-saves…

so… just in case anyone ever does this, just try to finish those drafts before exporting! THEN can easily use that wordpress plugin or python script.

Topic		Replies	Views
Can you export my wp site to jekyll and provide me the files setup in github hosting? Help	9	925	February 7, 2022
Resources For Converting Dynamic Sites to Static Content Suitable For Rendering By Jekyll Help	8	2077	January 4, 2018
Any tips / guide to Port a WordPress Theme to Jekyll? Help	6	2233	June 8, 2021
Problem Migrating from WordPress Help	0	404	July 1, 2022
Self hosted wordpress to jekyll	0	671	August 23, 2018