Converting pdf to jekyll html post?


Actually, this thread is more appropriate for seeking advice rather than a help thread, but since I couldn’t find a section that I felt was suitable, I created it in the “help” section. I apologize if I made a mistake.

In the process of migrating from non-Jekyll to Jekyll, when we have a large number of documents (PDF), is there anyone who can suggest or provide advice on the steps I should take to convert PDFs into a format acceptable by Jekyll (HTML)?

I can convert PDF to DOCX and then manually copy the text into .MD files, but there are more than 3000 PDFs and I feel that manual copying is not effective.

I am trying to avoid embedding PDFs with iframes or similar methods (embedding PDFs is the last resort I will take).

My question is:

Is there any tool or method that is simpler for converting PDFs into .MD (HTML)?

Thank you

Possibly you could use Marker to perform the batch conversion.

1 Like

Seeing you are using Jekyll, I assume you have Ruby installed. I found this Gist that might be helpful. Completely untested by me though :slight_smile:

1 Like

PDFs are being indexed by search engines. You can just create pages with links to these pdfs. Would that be sufficient? Or do you care about the new formatting?

You can convert the .docx to .txt and then create markdown from it with a PHP script. I think the result will depend on the consistency of your pdf files.