Generating a PDF from a folder of Markdown files
Published on under the Coding category.
To print the latest version of my blog, I used pandoc, a self-described "universal document converter." I love using Pandoc. I last used the tool for the first print edition of my blog, and I had a lot of success. Back then, however, I thought that I should do the post-processing work in Google Docs rather than programmatically. That led to my spending a lot of time making formatting changes in Google Docs. This was a time-consuming, less than ideal task.
This time around, I decided to research more of the flags available in Pandoc. I learned about the Table of Contens features, the metadata features for adding information to the top of the book, formatting options, and more. Pandoc even supports code syntax highlighting to some degree, although I did not research this feature in too much depth.
To generate the latest print edition of my blog, I wrote a bash script that had two steps:
- Processed the markdown files in which my blog posts are stored to extract the title from the front matter (metadata) and place it at the top of a new document without that metadata. The text of the post was then placed below the title in the new document.
- Used Pandoc to combine all of the new markdown documents into a single PDF file.
Here is the command I used for step two:
pandoc -s posts/* -o book.pdf --toc --toc-depth=2 --pdf-engine=xelatex \
--metadata title="James' Coffee Blog" --metadata author="James (jamesg.blog)" --metadata date="February 25th, 2023" \
--metadata lang="en-US" --metadata mainfont="Times New Roman" --metadata fontsize="12pt" \
--metadata geometry:margin=1in --metadata geometry:a4paper \
--highlight-style pygments \
--metadata link-citations=true
With this one command, I was able to produce a print-ready version of my blog after doing the requisite preprocessing. I didn't have to spend any time in Google Docs patching up different changes.
With that said, I did learn that you need to make sure any images to which you refer in your markdown files resolve correctly. I replaced all instances of /assets/
in my markdown files with an absolute path that points to the local assets stored on my computer. My SSG knows /assets/
refers to an external link, but Pandoc doesn't so I needed to amend the image paths appropriately.
Responses
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at readers@jamesg.blog.