Processing your WordPress content
How to convert your WordPress content into Markdown, a format that CloudCannon (and SSGs in general) can work with.
Now comes the transformation stage — we'll convert your WordPress content into a format that CloudCannon (and SSGs in general) can work with.
This is where we turn your WordPress posts into clean, efficient Markdown files. We'll also clean up any leftover WordPress-specific formatting, fix image paths, and set up a consistent structure for your content's metadata (what we call "front matter" in the static site world).
Convert your WordPress export to Markdown
I recommend using Will Boyd's wordpress-export-to-markdown
tool; it's well supported and has a good track record of producing Markdown that's compatible with a range of static site generators. It will also separate and save your image files. Another popular option, though not currently maintained, is Thomas Frössman's configurable exitwp
tool.
Note that whichever tool you use, you may still need to make some changes if you have a large or complicated WordPress site.
- Install your chosen conversion tool:
npm install wordpress-export-to-markdown
- Run conversion (note here we've renamed the WordPress export file to
export.html
and placed it in the same directory we're running the script from):
wordpress-export-to-markdown --input="export.xml" --output="markdown-output"
Clean up your content
Now we have some quite messy Markdown files! Let's get rid of the WordPress shortcodes and any empty lines, fix up our image paths, and get started on a front matter template.
- Process Markdown files:
# Remove WordPress shortcodes
sed -i 's/\\\\[\\\\w*.*\\\\]//' *.md
# Fix image paths
sed -i 's|wp-content/uploads|/assets/images|g' *.md
# Remove empty lines
sed -i '/^$/d' *.md
- Create a front matter template:
---
layout: post
title: "{{ post_title }}"
date: {{ post_date }}
categories:
- {{ categories }}
tags:
- {{ tags }}
featured_image: {{ featured_image }}
excerpt: {{ post_excerpt }}
---
Organize media files
- Create a new directory structure:
mkdir -p assets/{images,documents,downloads}
- Sort your media files:
# Sort by file type
find media_export -type f -iname "**.jpg" -o -iname "**.png" -o -iname "*.jpeg" -exec mv {} assets/images/ \\\\;
find media_export -type f -name "*.pdf" -exec mv {} assets/documents/ \\\\;
find media_export -type f -name "*.zip" -exec mv {} assets/downloads/ \\\\;
# Update paths in content
find markdown-output -type f -name "*.md" -exec sed -i 's|media_export|assets|g' {} \\\\;
Validation checklist
As our final step, we’re ready to make sure that absolutely everything from your WordPress site has been exported or extracted. Here’s a sample checklist that should cover most sites:
- Verify all posts and pages are exported
- Check media files are complete and organized
- Validate custom field data export
- Confirm all taxonomies are preserved
- Test sample content conversions
- Verify media file paths in converted content
- Check character encoding in exported files
- Validate front matter generation
Congratulations! You've now mapped all of your site's content, removed everything from WordPress, turned it into usable and editable files, and you're ready to get started with CloudCannon!
Feel free to reach out to our support team if you'd like a hand getting set up.
Lessons in this Tutorial
WordPress migration guide
- Preparing your WordPress migration
- Exporting your WordPress content
- Processing your WordPress content