Posts tagged with jekyll

From CircleCI to GitHub Actions for Jekyll publishing

I’ve been a big fan of static site generation since I switched from WordPress to Jekyll back in 2018. I’m also a big fan of learning new technologies as they come along, and now GitHub Actions are out in the wild; I thought this would be an opportunity to see how I can port my existing custom CircleCI build to Jekyll.

The CircleCI job

A quick recap from part 5 - Hosting & Building, my CircleCI configuration was basically two jobs that have subsequently been tweaked since then. They are:

Build

The build job’s responsibility was to configure a Ruby environment capable of executing Jekyll to build the site, removing the .html extension from the output filenames, and then indexing the content using Algolia. Here is the start of the .circleci/config.yml I was using for that:

version: 2
jobs:
  build:
    docker:
      - image: circleci/ruby:2.6.1
    working_directory: ~/jekyll
    environment:
      - JEKYLL_ENV=production
      - NOKOGIRI_USE_SYSTEM_LIBRARIES=true
      - JOB_RESULTS_PATH=run-results
    steps:
      - checkout
      - restore_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile" }}
      - run:
          name: Update gems
          command: gem update --system
      - run:
          name: Install dependencies
          command: bundle check --path=vendor/bundle || bundle install --path=vendor/bundle --jobs=4 --retry=3
      - save_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile" }}
          paths:
            - "vendor/bundle"
      - run:
          name: Create results directory
          command: mkdir -p $JOB_RESULTS_PATH
      - run:
          name: Build site
          command: bundle exec jekyll build --config _config.yml,_config-publish.yml 2>&1 | tee $JOB_RESULTS_PATH/build-results.txt
      - run:
          name: Remove .html suffixes
          command: find _site -name "*.html" -not -name "index.html" -exec rename -v 's/\.html$//' {} \;
      - run:
          name: Index with Algolia
          command: bundle exec jekyll algolia --config _config.yml,_config-publish.yml
      - store_artifacts:
          path: run-results/
          destination: run-results
      - persist_to_workspace:
          root: ~/jekyll
          paths:
            - _site

Deploy

Deploy takes the output from the build job and syncs it with the S3 bucket I use to publish the site. It then applies AWS S3-specific commands using the AWS CLI tool to ensure metadata, redirects, and caching are correctly set using a Python environment.

  deploy:
    docker:
      - image: circleci/python:2.7
    working_directory: ~/jekyll
    steps:
      - attach_workspace:
          at: ~/jekyll
      - run:
          name: Install AWS CLI
          command: sudo pip install awscli
      - run:
          name: Deploy to S3
          command: aws s3 sync _site s3://damieng-static/ --delete --content-type=text/html
      - run:
          name: Correct MIME for robots.txt automatically
          command: aws s3 cp s3://damieng-static/robots.txt s3://damieng-static/robots.txt --metadata-directive="REPLACE"
      - run:
          name: Correct MIME for sitemap.xml automatically
          command: aws s3 cp s3://damieng-static/sitemap.xml s3://damieng-static/sitemap.xml --metadata-directive="REPLACE"
      - run:
          name: Correct MIME for Atom feed manually
          command: aws s3 cp s3://damieng-static/feed.xml s3://damieng-static/feed.xml --no-guess-mime-type --content-type="application/atom+xml" --metadata-directive="REPLACE"
      - run:
          name: Redirect /damieng for existing RSS subscribers
          command: aws s3api put-object --bucket damieng-static --key "damieng" --website-redirect-location "https://damieng.com/feed.xml"
      - run:
          name: Latest Envy Code R redirect
          command: aws s3api put-object --bucket damieng-static --key "envy-code-r" --website-redirect-location "https://damieng.com/blog/2008/05/26/envy-code-r-preview-7-coding-font-released"
      - run:
          name: Latest Envy Code R redirect #2
          command: aws s3api put-object --bucket damieng-static --key "fonts/envy-code-r" --website-redirect-location "https://damieng.com/blog/2008/05/26/envy-code-r-preview-7-coding-font-released"
      - run:
          name: Latest Envy Code R download
          command: aws s3api put-object --bucket damieng-static --key "downloads/latest/EnvyCodeR" --website-redirect-location "https://download.damieng.com/fonts/original/EnvyCodeR-PR7.zip"
      - run:
          name: Correct MIME for CSS files
          command: aws s3 cp s3://damieng-static/css/ s3://damieng-static/css/ --metadata-directive="REPLACE" --recursive

GitHub Actions

So how could I go about this in GitHub Actions? I have to admit I spent far too long poking around and examining existing Jekyll actions. I have a bunch of steps here I need fine control of, especially around Algolia and S3. I finally ended up on what was quite a simple port.

Unlike the CircleCI configuration, I did not split these into two separate jobs because:

  1. There is no exact equivalent to persist_to_workspace and attach_workspace
  2. The alternative of storing and restoring the artifacts leaves large useless artifacts around
  3. I never ended up running the jobs separately
  4. GitHub Actions provides an environment with both Ruby and AWS CLI installed

So, on to the configuration which lives in .github/workflows/jekyll.yml in my case:

name: Build site and deploy

on:
  push:
    branches: [ master ]

jobs:
  build:
    name: Build + Deploy
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Setup Ruby
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: 2.6.1

      - name: Ruby gem cache
        uses: actions/cache@v1
        with:
          path: vendor/bundle
          key: ${{ runner.os }}-gems-${{ hashFiles('**/Gemfile.lock') }}
          restore-keys: |
            ${{ runner.os }}-gems-
            
      - name: Install dependencies
        run: |
          bundle config path vendor/bundle
          bundle install --jobs 4 --retry 3

      - name: Build Jekyll site
        run:  bundle exec jekyll build --config _config.yml,_config-publish.yml

      - name: Remove .html suffixes except for index.html
        run: find _site -name "*.html" -not -name "index.html" | while read f; do mv "$f" "${f%.html}"; done                      

      - name: Index with Algolia
        env:
          ALGOLIA_API_KEY: ${{ secrets.ALGOLIA_API_KEY }}
        run:  bundle exec jekyll algolia --config _config.yml,_config-publish.yml
    
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
    
      - name: Sync site with S3
        run:  aws s3 sync _site s3://damieng-static/ --delete --content-type=text/html

      - name: Correct MIME types
        run: |
          aws s3 cp s3://damieng-static/ s3://damieng-static/ --exclude "*" --include "*.txt" --metadata-directive="REPLACE"
          aws s3 cp s3://damieng-static/sitemap.xml s3://damieng-static/sitemap.xml --metadata-directive="REPLACE"
          aws s3 cp s3://damieng-static/feed.xml s3://damieng-static/feed.xml --no-guess-mime-type --content-type "application/atom+xml" --metadata-directive "REPLACE"
          aws s3 cp s3://damieng-static/css/ s3://damieng-static/css/ --metadata-directive "REPLACE" --recursive
          aws s3 cp s3://damieng-static/js/ s3://damieng-static/js/ --metadata-directive "REPLACE" --recursive
      
      - name: Redirect /damieng for existing RSS subscribers
        run:  aws s3api put-object --bucket damieng-static --key "damieng" --website-redirect-location "https://damieng.com/feed.xml"

      - name: Latest Envy Code R redirects
        run:  |
          aws s3api put-object --bucket damieng-static --key "fonts/envy-code-r" --website-redirect-location "https://damieng.com/blog/2008/05/26/envy-code-r-preview-7-coding-font-released"
          aws s3api put-object --bucket damieng-static --key "downloads/latest/EnvyCodeR" --website-redirect-location "https://download.damieng.com/fonts/original/EnvyCodeR-PR7.zip"
        
      - name: Set caching for images at 30 days
        run:  aws s3 cp s3://damieng-static/ s3://damieng-static/ --exclude "*" --include "*.svg" --include "*.ico" --include "*.jpg" --include "*.png" --include "*.webp" --include "*.gif" --recursive --metadata-directive REPLACE --expires 2034-01-01T00:00:00Z --acl public-read --cache-control max-age=2592000,public

      - name: Set caching for CSS and JS at 1 hour
        run:  aws s3 cp s3://damieng-static/ s3://damieng-static/ --exclude "*" --include "*.css" --include "*.js" --recursive --metadata-directive REPLACE --expires 2034-01-01T00:00:00Z --acl public-read --cache-control max-age=3600,public

I also took the opportunity to fix a long-running issue in that my S3 objects would lose my manually-applied cache settings (and new posts and files would not have any).

Gotchas

There were only a couple of bumps in the road once I decided on a straight-port rather than trying to leverage higher-level existing actions:

  1. The secrets configured in the repo settings were not automatically exposed to the commands running in the action. Instead, you have to expose them using the ${{ secrets.KEY_NAME }} syntax.
  2. I was using rename instead of mv. I don’t recall why. Perhaps it was my Windows-ness creeping in. Rename has been dropped in newer distros.

Summary

The syntax is surprisingly similar with build-times about the same as CircleCI. It’s just nice to have it in one place. Time to port some other repos over!

[)amien

WordPress to Jekyll part 6 - A faster build

My site goes back to 2004 and is reasonably sized but not massive even with the comments, so waiting 30 seconds for a change to reflect is disappointing.

I investigated several options and ended up using a few in combination, skipping others and holding out for one.

Incremental build

Launching with the --incremental switch, for example, bundle exec jekyll serve --incremental, can speed things up a bit, but there are issues here. One is that it still builds many pages, tags, and categories as it updates, and the second is that if you’re using collections, it misses rebuilding pages that reference that collection.

Verdict: Not great.

WSL2 or a Mac?

My primary dev box these days is a beefy Windows box. Jekyll doesn’t seem to want to run under native Windows, and Windows Subsystem for Linux (WSL) can be pretty slow compared to a Mac. Microsoft announced Windows Subsystem for Linux 2 (WSL2) which will be much faster but, it isn’t available until June 2019 to insiders.

Verdict: Hold tight.

Caching

Hubber extraordinaire Ben Balter developed the Jekyll Include Cache which can help speed up building.

The approach it uses is that you can switch out a regular include x for include_cached x, providing that x doesn’t depend on any state/variables. That meant my footer, sidebar, and navbar could all easily be parsed and stored just once by changing things like: {% include footer.html %} to {% include_cached footer.html %}

There are, however, places where state is required, and this plugin can still help there. For example, my header is called using {% include head.html %} and contained code like this:

<meta name="description" content="{{ page.excerpt | markdownify | strip_html | truncatewords:40 }}" />
<title>{{ page.title }} &raquo; DamienG</title>
<meta name="twitter:url" content="{{ page.url | absolute_url }}" />

What you can do here is pass parameters to include_cached and read them the other side. Replacing the include with:

{% include_cached head.html title=page.title url='page.url | absolute_url' excerpt='page.excerpt | markdownify | strip_html | truncatewords:40' %}

And changing the include itself to:

<meta name="description" content="{{ include.excerpt }}" />
<title>{{ include.title }} &raquo; DamienG</title>
<meta name="twitter:url" content="{{ include.url }}" />

Which made the header easy to read as the title, description, and URL are used multiple times for the various types of metadata (Twitter, OpenGraph, etc.)

Verdict: Some good improvement.

No paging, tags, and collections for local dev

I never actually browse the pages, tags, or collections locally but, they’re essential for the CI and publishing. I’ve manually turned them off in the past but, it’s error-prone and a pain. But there is a way to avoid that.

Move the parts of your configuration you don’t want to be part of your local build into a new configuration file. I moved the following from my _config.yml into a new _config-publish.yml:

autopages:
  enabled: true
  categories:
    enabled: true
  tags:
    enabled: true

pagination:
  enabled: true

I also moved my Aloglia configuration block here too. Now a local build with unpublished content avoids turning up in the search index!

It’s important to remember to add --config _config.yml,_config-publish.yml to any Jekyll build steps on your CI server. For example, mine has:

 - run:
    name:    Build site
    command: bundle exec jekyll build --config _config.yml,_config-publish.yml 2>&1 | tee $JOB_RESULTS_PATH/build-results.txt

and:

 - run:
    name:    Index with Algolia
    command: bundle exec jekyll algolia --config _config.yml,_config-publish.yml

Verdict: Awesome, drops from 30s to 17s for full build.

Turn off comments for local

I thought this would have more of an impact but, it only shaves a few seconds off the site build times.

To do this add the following first and last lines to your comments.html include (I’ll probably add this to the Jekyll Blog Comments system)

{% if site.comments.enabled %}
... (existing comments.html contents here)
{% endif %}

and remember to add this to your _config-publish.yml:

comments:
  enabled: true

Verdict: Okay, drops from 17s to 15s for full build.

Conclusion

With these steps, my site has gone from almost 50 seconds to about 17 for a full build without incremental. With incremental, that now means about 1-2 seconds!

I can’t wait for WSL2 to push this over the edge!

[)amien

WordPress to Jekyll part 5 - Hosting & building

The next stage is considering where to host the site and whether to use a content delivery network (CDN). My preferred approach on other sites has been to:

  1. Host the origin on GitHub pages - it’s fast to build and integrates with my source control
  2. Front it with Amazon’s AWS CloudFront CDN - it’s fast, cheap, and comes with a free SSL cert

Adding the CloudFront CDN was essential if you wanted SSL with your domain name, but GitHub pages added support for SSL certs with custom domains

Unfortunately, my blog is more complex than the other sites I’ve done, and two of the plugins I use have not been white-listed for use on GitHub pages. They are:

  1. paginate-v2 - required to get great tag & category support
  2. Algolia - needed for search indexing

Part of GitHub’s blazing speed comes from being a trusted environment. While I’m sure they’ll be white-listing paginate-v2 in the short term I’m not sure if the Algolia indexer is on the cards.

CircleCI build server

There are always plenty of options in the cloud so I looked for a build server. I’ve used AppVeyor, CodeShip and Travis CI before but decided to this time go with CircleCI as I wanted to try their new faster v2 docker-based infrastructure and take advantage of their free tier.

The v2 mechanism requires a new .circleci/config.yml that splits the process into jobs that are combined with a workflow. I created two jobs - one for the build and another for the deploy. They are:

version: 2
jobs:
  build:
    docker:
      - image: circleci/ruby:2.3
    working_directory: ~/jekyll
    environment:
      - JEKYLL_ENV=production
      - NOKOGIRI_USE_SYSTEM_LIBRARIES=true
      - JOB_RESULTS_PATH=run-results
    steps:
      - checkout
      - restore_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
      - run:
          name: Install dependencies
          command: bundle check --path=vendor/bundle || bundle install --path=vendor/bundle --jobs=4 --retry=3
      - save_cache:
          key: jekyll-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
          paths:
            - "vendor/bundle"
      - run:
          name: Create results directory
          command: mkdir -p $JOB_RESULTS_PATH
      - run:
          name: Build site
          command: bundle exec jekyll build 2>&1 | tee $JOB_RESULTS_PATH/build-results.txt
      - run:
          name: Remove .html suffixes
          command: find _site -name "*.html" -not -name "index.html" -exec rename -v 's/\.html$//' {} \;
      - run:
          name: Index with Algolia
          command: bundle exec jekyll algolia
      - store_artifacts:
          path: run-results/
          destination: run-results
      - persist_to_workspace:
          root: ~/jekyll
          paths:
            - _site

Origin hosting with S3

Given I’m going to use CloudFront for my CDN and that GitHub pages won’t work for this job I went with S3. I know it well, the command line tools are great, it’s cheap, fast and integrates well with CloudFront.

S3 did however bring a few problems with it’s own - primarily because the links on my blog had no file suffixes - I didn’t want either .php or .html and WordPress makes this a breeze.

Here’s my CircleCI job to deploy to S3. It involves:

  1. Starting with Python to get the AWS command-line tools
  2. Syncing the static site forcing everything as text/html to deal with the lack of file extensions
  3. Fixing up the few files I have that require a different MIME type (css, feed, robots etc)
  4. Creating a few helpful redirects for backward compatibility with existing links in the wild

(This configuration requires you’ve setup the AWS access key and secret in Circle for the command-line tools to use.)

deploy:
  docker:
    - image: circleci/python:2.7
    working_directory: ~/jekyll
  steps:
    - attach_workspace:
        at: ~/jekyll
    - run:
        name: Install awscli
        command: sudo pip install awscli
    - run:
        name: Deploy to S3
        command: aws s3 sync _site s3://damieng-static/ --delete --content-type=text/html
    - run:
        name: Correct MIME for robots.txt automatically
        command: aws s3 cp s3://damieng-static/robots.txt s3://damieng-static/robots.txt --metadata-directive="REPLACE"
    - run:
        name: Correct MIME for sitemap.xml automatically
        command: aws s3 cp s3://damieng-static/sitemap.xml s3://damieng-static/sitemap.xml --metadata-directive="REPLACE"
    - run:
        name: Correct MIME for Atom feed manually
        command: aws s3 cp s3://damieng-static/feed.xml s3://damieng-static/feed.xml --no-guess-mime-type --content-type="application/atom+xml" --metadata-directive="REPLACE"
    - run:
        name: Redirect /damieng for existing RSS subscribers
        command: aws s3api put-object --bucket damieng-static --key "damieng" --website-redirect-location "https://damieng.com/feed.xml"
    - run:
        name: Correct MIME for CSS files
        command: aws s3 cp s3://damieng-static/css s3://damieng-static/css --metadata-directive="REPLACE" --recursive

Tying together the build

Finally you just need a workflow to tie these two steps together at the end of your .circleci/config.yml

workflows:
  version: 2
  build-deploy:
    jobs:
      - build
      - deploy:
          requires:
            - build
          filters:
            branches:
              only: master

A complete version of my circle config is available.

CloudFront CDN

Adding the CloudFront CDN is pretty easy and well covered elsewhere. I’ll just point out that you must paste in the origin domain name from S3 and not choose the S3 bucket in the drop down. The latter ties CloudFront to the storage directly and ignores MIME types, redirects etc. By pasting the origin name in you’re taking advantage of the S3 WebSite features that make redirects etc. possible.

Also, while testing, you might want to specify a low TTL of say 120 (2 minutes) until things are fully stable.

[)amien

WordPress to Jekyll part 4 - Categories and tags

Jekyll does support categories and tags directly but doesn’t support the pagination of categories and tag list pages. The Paginate-v2 gem does solve this - and also lets you tweak the URL format.

My site used the URL formats /blog/category/{category-name} and /blog/tag/{tag-name} with 4 articles per page and a little pager at the bottom offering some indication of what page you are on, and some navigation arrows like this:

The pager

In order to render this pager, a little Liquid templating is required. Here’s my _includes/pagination.html that’s included within my multiple-posts layout used on the home page, categories and tag results.

{% if paginator.total_pages > 1 %}
<div class="pagination pagination-centered">
  <ul class="page-numbers">
  {% if paginator.previous_page %}
    <li><a href="{{ paginator.previous_page_path }}" class="prev">«</a></li>
  {% endif %}

  {% if paginator.page_trail %}
    {% for trail in paginator.page_trail %}
    <li>
        {% if page.url == trail.path %}
          <span class="page-numbers current">{{ trail.num }}</span>
        {% endif %}
        {% if page.url != trail.path %}
          <a href="{{ trail.path | prepend: site.baseurl | replace: '//', '/' }}" title="{{ trail.title }}">{{ trail.num }}</a>
        {% endif %}
    </li>
    {% endfor %}
  {% endif %}

  {% if paginator.next_page %}
    <li><a href="{{ paginator.next_page_path }}" class="next">»</a></li>
  {% endif %}
  </ul>
</div>
{% endif %}

Configuring paginate-v2

I configured paginate-v2 as close as I could to keep the experience consistent with my WordPress install, although the page numbers in the URL are different:

autopages:
  enabled: true
  collections:
    enabled: false
  categories:
    enabled: true
    layouts:
      - home.html
    permalink: '/blog/category/:cat'
    slugify:
      mode: pretty
  tags:
    enabled: true
    layouts:
      - home.html
    permalink: '/blog/tag/:tag'
    slugify:
      mode: pretty

pagination:
  enabled: true
  per_page: 4
  permalink: '/page/:num/'
  title: ':title - page :num'
  limit: 0
  sort_field: 'date'
  sort_reverse: 'true'
  trail:
      before: 2
      after: 2

Auditing categories and tags

Twelve years of blogging and multiple platforms can play havoc on what categories and tags you’ve used over the years. I wrote a quick page that lists all the categories and tags with a count next to each. Anything with only one or two articles is a waste of space so I’ve been cleaning up.

Here’s that page should you wish to add it to your site to help prune things down.

---
title: Audits
date: 2018-05-30 18:46:00-8:00
---
<h1>Audits</h1>

<h2>Categories</h2>
<ul>
{% for category in site.categories %}
  <li><a href="/blog/category/{{ category | first | replace: ' ', '-' | downcase }}">{{ category | first }}</a> ({{ category[1] | size }})</li>
{% endfor %}
</ul>

<h2>Tags</h2>
<ul>
{% for tag in site.tags %}
  <li><a href="/blog/tag/{{ tag | first | replace: ' ', '-' | downcase }}">{{ tag | first }}</a> ({{ tag[1] | size }})</li>
{% endfor %}
</ul>

See you in part 5 - hosting.

[)amien