My Ghost to Hugo migration

If you've visited this blog before you might have noticed that it looks a bit different than before. You are not mistaken and the reason is that I spent some time on migrating it from the Ghost blogging platform to a static site generator called Hugo. For this post I tried to document my migration steps a bit, maybe it's useful to other people.

They are not 100% comparable as one of them is a full blown publishing platform with features like newsletters, running paid membership programs and other features for professionals. Hugo on the other hand is — while also being extremely powerful — much more focused on being customizable and fast. It also doesn't serve the pages dynamically but pre-generates the html pages that then get served by the web server (nginx in my case). This makes the whole operation a lot faster but with the downside of not being able to just update a post and have it show up on the page without regenerating the html files.

You might ask: "If both are so great why switch then?"

My Ghost workflow consisted of writing posts on iOS or macOS in iA Writer. Once I was ready to publish a post I directly pushed it to Ghost from within iA Writer. This works well but there are multiple problems:

I can't sync back changes from Ghost (Like when I fix a typo in the Ghost CMS) to my local machine
The format that Ghost stores isn't plain markdown but some more powerful format for embedding images, quotes etc.
The theme I was using for Ghost required me to add an image for every post. This always added some friction to the process of writing that I didn't like.

Migration

There were a few things that I had to make sure were being taken care of in the process, unfortunately there's no shortcut our out of the box solution for that.

What I definitely wanted to achieve:

Don't lose the post history
Don't break existing URLs
Don't mess up post formats / break images
Keep the "featured" image of a post in place, that's the one that Ghost shows in the post header

1) Export posts from Ghost

To export the posts from Ghost I used their export feature which gives you a nice JSON file to work with. Then I used a tool called ghostToHugo which converts them into Markdown files with the correct file names and a Front Matter that Hugo expects.

Images are not included in the export from Ghost so you have to get them yourself from your server with scp / ftp or whatever you were using before and temporarily store them in a directory somewhere. We need them for step 3.

2) Set up new Hugo blog

Create a new Hugo site, customize theme, make sure it has working full RSS feeds. This doesn't sound like a lot but that's what took up most of my time.

3) Cleaning data, migrating posts, fixing images

This is not a step-by-step tutorial as this always depends on how your data looks like, I'm just trying to give an idea of which things I had to do and have some code snippets for inspiration.

This step was the most annoying one as I had to write a bunch of scripts to fix the exported and converted posts. After using ghostToHugo they were in the right format but in the wrong location, images were embedded in different ways, the images were not in the directory of the post and the "featured" image of the posts was not set.

This also took up way more time than expected as I was using Hugo's Page bundles. That means that each post would be one directory called 2020-01-01-slug-of-post containing a index.md file with the actual blog post and any images used in the post would just be stored in this directory too. I went with this approach over the default way of having a list of flat files and storing all your images in static/ because that becomes messy very fast.

Script 1: Fix directory structure

The first step was a script that creates these directories and index.md files from the list of flat files exported from Ghost.

Input:

my-old-blog-post.md

Output after my script:

2015-01-01-my-old-blog-post
└── index.md

Most of these scripts are roughly the same so I just include one for reference and then some snippets, it basically just iterates over the directory of posts, extracts the data we need for the new directory structure (data, slug) from the old .md file, creates the directory, moves the .md file and renames it to index.md:

func main() {
    files, err := ioutil.ReadDir("/Users/philipp/export-hugo/content/post")
    if err != nil {
        log.Fatal(err)
    }

    for _, file := range files {
        f, err := os.Open("/Users/philipp/export-hugo/content/post/" + file.Name())
        if err != nil {
            fmt.Println(err)
            continue
        }
        scanner := bufio.NewScanner(f)
        scanner.Split(bufio.ScanLines)
        var date, slug string

        for scanner.Scan() {
            matches := reDate.FindStringSubmatch(scanner.Text())
            if len(matches) == 2 {
                date = matches[1]
            }
            matches = reSlug.FindStringSubmatch(scanner.Text())
            if len(matches) == 2 {
                slug = matches[1]
            }
        }

        f.Close()

        t, err := time.Parse(time.RFC3339, date)
        if err != nil {
            fmt.Println(err)
        }

        newDir := t.Format("2006-01-02") + "-" + slug
        if err := os.Mkdir(newDir, 0755); err != nil {
            fmt.Println(err)
        }
        err = os.Rename("/Users/philipp/export-hugo/content/post/"+file.Name(), newDir+"/index.md")
        if err != nil {
            log.Fatal(err)
        }
    }
}

Script 2: Extract image names, find image and move it

Current state: Posts are in the correct format and in the correct location (directory with date and slug containing index.md file with the post body)

We now have to extract all image names from each post, find the images in our directory of images we downloaded, then move the image to the corresponding post directory.

The images are linked in different ways in Ghost, depending on which options you choose or if it's a pure Markdown post or a mix. I had a bunch of posts that were purely in Markdown format, and a bunch of them that used <figure> for image captions.

I used the following regular expressions to extract them from the index.md

reImageCaption   = regexp.MustCompile(`figure\ssrc="(.+?)".+caption="<em>(.+?)<\/em>"`)
reImageNoCaption = regexp.MustCompile(`figure\ssrc="(.+?)".+?`)
reImagesInline   = regexp.MustCompile(`!\[.*\]\((.+?)\)`)

The script in essence iterates over all posts, tries to find images with the before-mentioned regular expressions and then moves them from their old location to the new one.

func main() {
    files, err := ioutil.ReadDir("/Users/philipp/Blog/blog.notmyhostna.me/content/posts")
    if err != nil {
        log.Fatal(err)
    }
    for _, file := range files {
        if file.Name() == ".DS_Store" {
            continue
        }
        f, err := os.Open("/Users/philipp/Blog/blog.notmyhostna.me/content/posts/" + file.Name())
        if err != nil {
            fmt.Println(err)
            continue
        }

        postFiles, err := ioutil.ReadDir(f.Name())
        if err != nil {
            log.Fatal(err)
        }
        for _, pfl := range postFiles {
            if !strings.Contains(pfl.Name(), ".md") {
                continue
            }
            //fmt.Println("> found post file in directory", pfl.Name())
            pf, err := os.Open(f.Name() + "/" + pfl.Name())
            if err != nil {
                fmt.Println(err)
                continue
            }
            scanner := bufio.NewScanner(pf)
            scanner.Split(bufio.ScanLines)

            type imageWithCaption struct {
                url     string
                caption string
            }
            var images []imageWithCaption

            var rows []string
            for scanner.Scan() {
                t := scanner.Text()
                matches := reImageCaption.FindStringSubmatch(t)
                var found bool
                if len(matches) == 3 {
                    found = true
                    iwc := imageWithCaption{
                        url:     matches[1],
                        caption: matches[2],
                    }
                    if strings.Contains(matches[1], "/content/images") {
                        images = append(images, iwc)
                    }
                    rows = append(rows, fmt.Sprintf("![%s](%s)\n\n%s", filepath.Base(iwc.url), filepath.Base(iwc.url), iwc.caption))
                }
                if !found {
                    matches2 := reImageNoCaption.FindStringSubmatch(t)
                    if len(matches2) == 2 {
                        found = true
                        iwc := imageWithCaption{
                            url: matches2[1],
                        }
                        if strings.Contains(matches2[1], "/content/images") {
                            images = append(images, iwc)
                        }
                        rows = append(rows, fmt.Sprintf("![%s](%s)", filepath.Base(iwc.url), filepath.Base(iwc.url)))
                    }
                }
                if !found {
                    matches3 := reImagesInline.FindStringSubmatch(t)
                    if len(matches3) == 2 {
                        found = true
                        iwc := imageWithCaption{
                            url: matches3[1],
                        }
                        if strings.Contains(matches3[1], "/content/images") {
                            images = append(images, iwc)
                        }
                    }
                }
                if !found {
                    rows = append(rows, t)
                }
            }

            if len(images) == 0 {
                continue
            }

            f.Close()
            fmt.Println("images", images)
            for _, iwc := range images {
                oldPath := "/Users/philipp/export-hugo-images" + iwc.url
                fmt.Println("oldPath: ", oldPath)
                fn := filepath.Base(iwc.url)
                newPath := f.Name() + "/" + fn
                fmt.Println("new path: ", newPath)
                err = os.Rename(oldPath, newPath)
                if err != nil {
                    fmt.Println("err but moving on")
                }
            }
        }
    }
}

Script 3: Set featured image of post

In the converted files there's a key called image in the Front Matter of each post. This contains the file name of an image that used to be the "Featured" image of a post in Ghost (the big image above a post).

I didn't want to be forced to set an image for each post I'm publishing in the future so I just wanted Hugo to set an image if there's a file called feature.{jpg,png} in the post directory. To achieve that I added a condition in my template that does just that.

<div class="image">
    <a href="{{.RelPermalink}}">
    {{ $image := .Resources.GetMatch "feature.*" }}
    {{ with $image }}
        <img src="{{ .RelPermalink }}">
    {{ end }}
    </a>
</div>

The next step was to copy the image that was defined in the image key of my post from the downloaded images to my post directory and rename it to feature.{jpg,png}.

That was pretty easy as I just had to extract the image name from the post, iterate over the image files, take the matching one and rename / move it.

    reImage        = regexp.MustCompile(`image\s=\s"(.+)"`)

for scanner.Scan() {
                matches := reImage.FindStringSubmatch(scanner.Text())
                if len(matches) == 2 {
                    image = matches[1]
                }
            }

            if image == "" {
                continue
            }

            f.Close()

            oldPath := "/Users/philipp/export-hugo-images/content" + image
            fmt.Println("old image path: ", oldPath)

            fn := filepath.Ext(filepath.Base(image))
            newPath := f.Name() + "/" + "feature" + fn
            fmt.Println("new image path: ", newPath)
            err = os.Rename(oldPath, newPath)
            if err != nil {
                log.Fatal(err)
            }
}

The last step was a bunch of search / replace actions in VS Code sprinkled with some regex magic to remove old file paths for images from the post bodies and to clean up unused keys in the Front Matter (author, image, unnecessary new lines,...).

4) Migrate URLs

It's very important to not break URLs that are referenced elsewhere or indexed by Google so there's already a system in place in Hugo to take care of that. First we have to look at what we are dealing with.

The Ghost URLs for a blog post were in the following format:

blog.notmyhostna.me/slug-of-post

I defined my URL structure in config.yaml to look like this:

permalinks:
  posts: ':section/:title/'

This results in URLs in this format:

blog.notmyhostna.me/posts/slug-of-posts

Hugo's solution to the problem is called Aliases and you only have to provide alternative URLs for the given resource in the Front Matter of the post. This was easily done by duplicating the slug key that ghostToHugo created for us and renaming it to aliases. Be aware that aliases accepts a list of URLs that's why the format looks a bit different in YAML.

---
slug: "apple-ruined-itunes-what-now"
aliases:
    - "/apple-ruined-itunes-what-now"
title: "Apple ruined iTunes — What now?"
---

Your post is now reachable from both URLs but with the correct canonical URL set in the header.

I hope this was somewhat helpful and if you have any specific questions feel free to reach out. Happy to help!