Posts in category development

WordPress to Jekyll part 5 - Hosting & building

Part of my series on migrating from WordPress to Jekyll.

  1. My history & reasoning
  2. Comments & commenting
  3. Site search
  4. Categories & tags
  5. Hosting & building

The next stage is considering where to host the site and whether to use a content delivery network (CDN). My preferred approach on other sites has been to:

  1. Host the origin on GitHub pages - it’s fast to build and integrates with my source control
  2. Front it with Amazon’s AWS CloudFront CDN - it’s fast, cheap and comes with a free SSL cert

Adding the CloudFront CDN was essential if you wanted SSL plus your own domain name but May sGitHub pages added support for SSL certs with custom domains

Unfortunately my blog is a bit more complex than the other sites I’ve done and two of the plugins I use have not been white-listed for use on GitHub pages. They are:

  1. paginate-v2 which is required to get great tag & category support
  2. Algolia which is needed for search indexing

Part of GitHub’s blazing speed comes from a trusted environment and while I’m sure they’ll be white-listing paginate-v2 in the short term I’m not sure if the Algolia indexer is on the cards.

CircleCI build server

There are always plenty of options in the cloud so I looked for a build server. I’ve used AppVeyor, CodeShip and Travis CI before but decided to this time go with CircleCI as I wanted to try their new faster v2 docker-based infrastructure and take advantage of their free tier.

The v2 mechanism requires a new .circleci/config.yml that splits the process into jobs that are combined with a workflow. I created two jobs - one for the build and another for the deploy. They are:

version: 2
jobs:
  build:
    docker:
      - image: circleci/ruby:2.3
    working_directory: ~/jekyll
    environment:
      - JEKYLL_ENV=production
      - NOKOGIRI_USE_SYSTEM_LIBRARIES=true
      - JOB_RESULTS_PATH=run-results
    steps:
      - checkout
      - restore_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
      - run:
          name: Install dependencies
          command: bundle check --path=vendor/bundle || bundle install --path=vendor/bundle --jobs=4 --retry=3
      - save_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
          paths:
            - "vendor/bundle"
      - run:
          name: Create results directory
          command: mkdir -p $JOB_RESULTS_PATH
      - run:
          name: Build site
          command: bundle exec jekyll build 2>&1 | tee $JOB_RESULTS_PATH/build-results.txt
      - run:
          name: Remove .html suffixes
          command: find _site -name "*.html" -not -name "index.html" -exec rename -v 's/\.html$//' {} \;
      - run:
          name: Index with Algolia
          command: bundle exec jekyll algolia
      - store_artifacts:
          path: run-results/
          destination: run-results
      - persist_to_workspace:
          root: ~/jekyll
          paths:
            - _site

Origin hosting with S3

Given I’m going to use CloudFront for my CDN and that GitHub pages won’t work for this job I went with S3. I know it well, the command line tools are great, it’s cheap, fast and integrates well with CloudFront.

S3 did however bring a few problems with it’s own - primarily because the links on my blog had no file suffixes - I didn’t want either .php or .html and WordPress makes this a breeze.

Here’s my CircleCI job to deploy to S3. It involves:

  1. Starting with Python to get the AWS command-line tools
  2. Syncing the static site forcing everything as text/html to deal with the lack of file extensions
  3. Fixing up the few files I have that require a different MIME type (css, feed, robots etc)
  4. Creating a few helpful redirects for backward compatibility with existing links in the wild

(This configuration requires you’ve setup the AWS access key and secret in Circle for the command-line tools to use.)

deploy:
  docker:
    - image: circleci/python:2.7
    working_directory: ~/jekyll
  steps:
    - attach_workspace:
        at: ~/jekyll
    - run:
        name: Install awscli
        command: sudo pip install awscli
    - run:
        name: Deploy to S3
        command: aws s3 sync _site s3://damieng-static/ --delete --content-type=text/html
    - run:
        name: Correct MIME for robots.txt automatically
        command: aws s3 cp s3://damieng-static/robots.txt s3://damieng-static/robots.txt --metadata-directive="REPLACE"
    - run:
        name: Correct MIME for sitemap.xml automatically
        command: aws s3 cp s3://damieng-static/sitemap.xml s3://damieng-static/sitemap.xml --metadata-directive="REPLACE"
    - run:
        name: Correct MIME for Atom feed manually
        command: aws s3 cp s3://damieng-static/feed.xml s3://damieng-static/feed.xml --no-guess-mime-type --content-type="application/atom+xml" --metadata-directive="REPLACE"
    - run:
        name: Redirect /damieng for existing RSS subscribers
        command: aws s3api put-object --bucket damieng-static --key "damieng" --website-redirect-location "https://damieng.com/feed.xml"
    - run:
        name: Correct MIME for CSS files
        command: aws s3 cp s3://damieng-static/css s3://damieng-static/css --metadata-directive="REPLACE" --recursive

Tying together the build

Finally you just need a workflow to tie these two steps together at the end of your .circleci/config.yml

workflows:
  version: 2
  build-deploy:
    jobs:
      - build
      - deploy:
          requires:
            - build
          filters:
            branches:
              only: master

A complete version of my circle config is available.

CloudFront CDN

Adding the CloudFront CDN is pretty easy and well covered elsewhere. I’ll just point out that you must paste in the origin domain name from S3 and not choose the S3 bucket in the drop down. The latter ties CloudFront to the storage directly and ignores MIME types, redirects etc. By pasting the origin name in you’re taking advantage of the S3 WebSite features that make redirects etc. possible.

Also, while testing, you might want to specify a low TTL of say 120 (2 minutes) until things are fully stable.

[)amien

WordPress to Jekyll part 4 - Categories and tags

Part of my series on migrating from WordPress to Jekyll.

  1. My history & reasoning
  2. Comments & commenting
  3. Site search
  4. Categories & tags
  5. Hosting & building

Jekyll does support categories and tags itself however it doesn’t support producing pagination of the categories and tag list pages. This is instead solved by the Paginate-v2 gem which also lets you tweak the url format.

My site used the url formats /blog/category/{category-name} and /blog/tag/{tag-name} with 4 articles per page and a little pager at the bottom offering some indication of what page you are on, and some navigation arrows like this:

The pager

In order to render this pager a little Liquid templating is required. Here’s my _includes/pagination.html that’s included within my multiple-posts layout used on the home page, categories and tag results.

{% if paginator.total_pages > 1 %}
<div class="pagination pagination-centered">
  <ul class="page-numbers">
  {% if paginator.previous_page %}
    <li><a href="{{ paginator.previous_page_path }}" class="prev">«</a></li>
  {% endif %}

  {% if paginator.page_trail %}
    {% for trail in paginator.page_trail %}
      <li>
         {% if page.url == trail.path %}
          <span class="page-numbers current">{{ trail.num }}</span>
        {% else %}
          <a href="{{ trail.path | prepend: site.baseurl | replace: '//', '/' }}" title="{{ trail.title }}">{{ trail.num }}</a>
        {% endif %}
    </li>
    {% endfor %}
  {% endif %}

  {% if paginator.next_page %}
    <li><a href="{{ paginator.next_page_path }}" class="next">»</a></li>
  {% endif %}
  </ul>
</div>
{% endif %}

Configuring paginate-v2

I configured paginate-v2 as close as I could to keep the experience consistent with my WordPress install although the page numbers in the url are different:

autopages:
  enabled: true
  collections:
    enabled: false
  categories:
    enabled: true
    layouts:
      - home.html
    permalink: '/blog/category/:cat'
    slugify:
      mode: pretty
  tags:
    enabled: true
    layouts:
      - home.html
    permalink: '/blog/tag/:tag'
    slugify:
      mode: pretty

pagination:
  enabled: true
  per_page: 4
  permalink: '/page/:num/'
  title: ':title - page :num'
  limit: 0
  sort_field: 'date'
  sort_reverse: 'true'
  trail:
      before: 2
      after: 2

Auditing categories and tags

Twelve years of blogging and multiple platforms can play havoc on what categories and tags you’ve used over the years. I wrote a quick page that lists all the categories and tags with a count next to each. Anything with only one or two articles is a waste of space so I’ve been cleaning up.

Here’s that page in case you want to add it to your site to help prune things down.

---
title: Audits
date: 2018-05-30 18:46:00-8:00
---
<h1>Audits</h1>

<h2>Categories</h2>
<ul>
{% for category in site.categories %}
  <li><a href="/blog/category/{{ category | first | replace: ' ', '-' | downcase }}">{{ category | first }}</a> ({{ category[1] | size }})</li>
{% endfor %}
</ul>

<h2>Tags</h2>
<ul>
{% for tag in site.tags %}
  <li><a href="/blog/tag/{{ tag | first | replace: ' ', '-' | downcase }}">{{ tag | first }}</a> ({{ tag[1] | size }})</li>
{% endfor %}
</ul>

See you in part 5 - hosting.

[)amien

WordPress to Jekyll part 3 - Site search

Part of my series on migrating from WordPress to Jekyll.

  1. My history & reasoning
  2. Comments & commenting
  3. Site search
  4. Categories & tags
  5. Hosting & building

Site search is a feature that WordPress got right and, importantly, analytics tell me is popular. A static site is once again at a big disadvantage but we have some options to address that.

Considering options

My first consideration was to use Google Site Search but that was deprecated last year. There are alternative options but few are free. I’m not opposed to people being paid for their services, something has to keep the lights on, but a small personal blog with no income stream can’t justify the cost.

My next thought was to generate reverse index JSON files during site build and then write some client-side JavaScript that would utilize them as the user types in the search box to find the relevant posts. It’s an idea I might come back to but the migration had already taken longer than I anticipated and I like to ship fast and often :)

Algolia

I soon came across Algolia which not only provides a simple API and a few helper libraries but also a Jekyll plug-in to generate the necessary search indexes AND has a free tier that requires just a logo placement and link to their site! Awesome.

Setup was a breeze and Algolia have a specific guide to indexing with Jekyll that was useful. Once you’ve signed up the main parts are configuring indexing and integrating with your site.

Index integration

First install the jekyll-algolia gem making sure it’s specified in your gemfile.

Then configure your Jekyll _config.yml so it knows what to index and where as well as what document attributes are important:

algolia:
  application_id: {your-algolia-app-id}
  index_name: {your-algolia-index-name}
  settings:
    searchableAttributes:
      - title
      - excerpt_text
      - headings
      - content
      - categories
      - tags
    attributesForFaceting:
      - type
      - searchable(categories)
      - searchable(tags)
      - searchable(title)

Finally you’ll need to run the indexing. You need to ensure the environment variable ALGOLIA_API_KEY is set to your private Admin API Key from your Algolia API Keys page then run the following command after your site is built:

bundle exec jekyll algolia

Site integration

Wiring up the search box can be a little overwhelming as they have so many clients, options and APIs available. I went with a design that presents the results as you type like this:

This uses two of their libraries - the search lite and the search helper plus some code to wire it up to my search box and render the results in a drop-down list. I’ll probably further tweak the result format and maybe consider wiring up to the API directly as two libraries for such a simple use case seems a bit overkill.

<script src="https://cdn.jsdelivr.net/npm/algoliasearch@3/dist/algoliasearchLite.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/algoliasearch-helper@2.26.0/dist/algoliasearch.helper.min.js"></script>
<script>
  let searchForm = document.getElementById('search-form')
  let hits = document.getElementById('hits')
  let algolia = algoliasearch('{your-algolia-app-id}', '{your-algolia-search-token}')
  let helper = algoliasearchHelper(algolia, '{your-algolia-index-name}',
    { hitsPerPage: 10, maxValuesPerFacet: 1, getRankingInfo: false })
  helper.on('result', searchCallback)

  function runSearch() {
    let term = document.getElementById('s').value
    if (term.length > 0)
      helper.setQuery(term).search()
    else
      searchForm.classList.remove('open')
  }

  function searchCallback(results) {
    if (results.hits.length === 0) {
      hits.innerHTML = '<li><a>No results!</a></li>'
    } else {
      renderHits(results)
      searchForm.classList.add('open')
    }
    let credits = document.createElement('li');
    credits.innerHTML = "<img src=\"https://www.algolia.com/static_assets/images/press/downloads/search-by-algolia.svg\" onclick=\"window.open('https://www.algolia.com', '_blank')\" />"
    hits.appendChild(credits)
  }

  function renderHits(results) {
    hits.innerHTML = ''
    for (let i = 0; i < results.hits.length; i++) {
      let li = document.createElement('li')
      let title = document.createElement('a')
      title.innerHTML = results.hits[i]._highlightResult.title.value
      title.href = results.hits[i].url
      li.appendChild(title)
      hits.appendChild(li)
    }
  }
</script>

Analytics

I’m a big proponent of analytics when used purely for engineering improvement and Algolia provides a useful dashboard to let you know how performance is doing, what topics are being searching for and what searches might not be returning useful content.

I’ll dig through that when I have a little more time however. The backlog of ideas for posts is taking priority right now!

[)amien Note: I did not and do not receive any compensation from Algolia either directly or via any kind of referral program. I’m just a happy user.

WordPress to Jekyll part 2 - Comments & commenting

Part of my series on migrating from WordPress to Jekyll.

  1. My history & reasoning
  2. Comments & commenting
  3. Site search
  4. Categories & tags
  5. Hosting & building

I do enjoy discussion and debate whether designing software or writing articles. Many times the comments have explored the subject further or offered corrections or additional insights and tips. For me, they are vital on my blog so I was somewhat disappointed that Jekyll provides nothing out of the box to handle them.

Third-party solutions like Disqus exist that require you either pay a subscription or have ads inlined with the comments. That $9/month adds up and the alternative of injecting ads onto my blog just to support comment infrastructure doesn’t sit right with me.

Storing comments

So what does Jekyll have that we could build upon?

Well, one very useful feature is the ability to process ‘site data’ held in YML files as a kind of data source for generating content via the Liquid templating language.

So, if we store each comment in a file named _data/{blog_post_slug}/{comment_id}.yml with this format:

id: 12345
name: Damien Guard
email: damieng@gmail.com
gravatar: dc72963e7279d34c85ed4c0b731ce5a9
url: https://damieng.com
date: 2007-12-18 18:51:55
message: "This is a great solution for 'dynamic' comments on a static blog!"

Then we have a model where we can gather all the ones that respond to a post by traversing a single folder and performing some sorting.

By using one-file-per-comment we also make deleting, approving and managing comments as easy as possible.

Rendering comments

Now we can create test data and attempt rendering. I created three Jekyll includes that match my WordPress theme, they are:

  • Render an individual comment (comment.html)
  • Show a form to accept a new comment (new-comment.html)
  • Loop over individual comments for a post (comments.html)

I’ve included all three includes you can copy to your Jekyll _includes folder.

The simplest option is to then just include the comments.html file. For example, my blog post template file looks like this:

---
layout: default
---
<div class="post {{ page.class }}">
  {% include item.html %}
  {{ page.content }}
  {% include comments.html %}
</div>

You’ll also need to add the following line to your Jekyll _config.yml. This is required so my sort function can work due to a couple of restrictions in Jekyll.

emptyArray: []

Exporting comments from WordPress

The next step is getting all the comments out of your existing system. I was using WordPress so created a simple PHP script that will extract them all into individual files with the right metadata and structure.

  • Upload this file to your site
  • Access export-blog-comments.php via your browser and wait for it to complete
  • Download the /comments/ folder over SSH and then remove it and the export-blog-comments.php from your server
  • Copy the /comments/ folder into your Jekyll _data/ folder

Disqus users should check out Phil Haack’s Disqus exporter!

Accepting new comments with an Azure function

We can now render existing comments but what about accepting new ones?

At a minimum we need to accept a HTTP form post and commit a new YML file. Ideally with some validation, a redirect to a thanks page and with the new YML file in a pull request or other moderation facility. Merging the PR will cause a site rebuild and publish the new comment :)

Platform and choices

I chose:

  1. GitHub to host my blog and comments as I use it for my code projects
  2. Azure Function App for the form-post-to-pull-request - details below
  3. C# for the function - a great language I know with good libs

I went with Azure Function Apps for a few reasons:

  • They accept HTTP/HTTPS directly without configuring an “API Gateway”
  • Comment posting is a short-lived operation that happens quite infrequently
  • Free monthly grants of 1 m executions/400,000 GB-s should mean no charge
  • Taking a second or two to spin-up the function should be fine in the users context

(Disclaimer: I have a free MSDN subscription that includes Azure credits as part of my ASP Insider membership although I do not expect this solution to use any of it)

Other platforms

You could easily port this to another C#-capable environment - or port the solution entirely to another language.

If you have a lot of comments you could run the function on three platforms and round-robin the DNS to take advantage of the free usage tiers on each.

How it works

The form receiver function for comments relies on a couple of libraries to deal with YML and GitHub but is otherwise self-explanatory. What it does is:

  1. Receives the form post over HTTP/HTTPS
  2. Attempts to create an instance of the Comment class by mapping form keys to constructor args
  3. Emits errors if any constructor args are missing (unless they have a default)
  4. Creates a new branch against your default using the GitHub OctoKit.NET library
  5. Creates a commit to the new branch with the Comment serialized to YML using YamlDotNet
  6. Creates a pull request to merge the branch with an informative title and body

Installation

Installation requires a few steps but can then just update whenever you update your fork.

  1. Fork the jekyll-blog-comments-azure repo
  2. Create a Function App in the Azure portal (I went with consumption plan on Windows)
  3. Go to Deployment Options, tap Setup and choose GitHub
  4. Authorize it to your GitHub account
  5. Configure Project to your fork of jekyll-blog-comments-azure
  6. Configure Branch to master

You will also need to setup two Application Settings for your function so it can create the necessary pull requests, they are:

  • GitHubToken should be a personal access token with repo rights
  • PullRequestRepository should contain the org and repo name, e.g. damieng/my-blog

The final step is to modify your Jekyll _config.yml so it knows where to post the form. For example:

comments:
  receiver: https://damiengapp.azurewebsites.net/api/PostComment

You should now be able to post a comment on your blog and see it turn up as a pull request against your repository!

Extra steps

  • You can have post authors replies highlighted differently
  • Threaded comments could be supported - feel free to send a pull request or I’ll get to this in time
  • Anti-spam measures will likely need to be improved at some point - right now this is just client-side in JS that requires a second ‘Confirm comment’ click

In Part 3 of the series I’ll go into how I implemented my site search with Algolia!

[)amien