Importing BlogML into WordPress

I’ve been trying to get my content out of Subtext and into WordPress – a process that shouldn’t be difficult however Subtext only supports the blog-independent BlogML format and whilst WordPress supports a number of import formats BlogML isn’t one of them. For export WordPress only supports it’s own WordPress WXR format although the BlogML guys have an exporter available.

The first idea was to put together an XSL transform to convert BlogML to WXR.

BlogML format

BlogML posts look like this although Subtext fails to populate the views attribute or even a tag for the user-email as at 1.9.3. It also doesn’t include a field for a commenter’s IP addresses. These two limitations mean no Gravatars or Identicons at the other end right now.

<post id="1" date-created="2006-04-24T04:07:00" date-modified="2006-04-25T11:55:00" approved="true"
    post-url="https://damieng.com/blog/archive/2008/01/30/Test.aspx" type="normal" hasexcerpt="false" views="0">
  <title type="text"><![CDATA[This is a test]]></title>
  <content type="text"><![CDATA[Just testing content]]></content>
  <post-name type="text"><![CDATA[ThisIsATest]]></post-name>
  <categories>
    <category ref="1" />
  </categories>
  <authors>
    <author ref="1" />
  </authors>
</post>

WXR format

WXR posts are extended RSS items and annoyingly doesn’t have a field for view counts at all.

<item>
  <title>This is a test</title>
  <link>https://damieng.com/blog/archive/2008/01/30/Test.aspx</link>
  <pubDate>Thu, 04 Apr 2006 04:07:00 +0000</pubDate>
  <dc:creator>Damien Guard</dc:creator>
  <guid isPermaLink="false">https://damieng.com/blog/archive/2008/01/30/Test.aspx</guid>
  <description></description>
  <content:encoded><![CDATA[Just testing content]]></content:encoded>
  <wp:post_id>1</wp:post_id>
  <wp:post_date>2006-04-24 04:07:00</wp:post_date>
  <wp:post_date_gmt>2006-04-24 04:07:00</wp:post_date_gmt>
  <wp:comment_status>open</wp:comment_status>
  <wp:ping_status>open</wp:ping_status>
  <wp:post_name>about</wp:post_name>
  <wp:status>publish</wp:status>
  <wp:post_parent>0</wp:post_parent>
  <wp:menu_order>0</wp:menu_order>
  <wp:post_type>post</wp:post_type>
</item>

Convert BlogML to WXR using XSLT

There are a few things to bear in mind when using this transform:

  • Link and guid tags are populated but WordPress seems to ignore them. Will investigate soon!
  • Time-zone conversion does not take place – hand-code +offsets in the XSLT to deal with your zone.
  • Track-backs not yet considered.
  • Default namespace in BlogML is not handled – remove the xmlns=”…” declaration from your BlogML file before transforming.
  • HTML within comments is not supported – when I enabled this WordPress treated the HTML as text.
  • Embedded attachments are not supported.
  • Edit the primary site link at channel/link in the transformed file to match your site – BlogML doesn’t include it.

Multiple authors and categories should work just fine so throw this file and your BlogML export through an XSLT processor and presto, WXR content ready for import.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:dc="http://purl.org/dc/elements/1.1/"
                              xmlns:wp="http://wordpress.org/export/1.0/"
                              xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <xsl:output method="xml" indent="yes" cdata-section-elements="content:encoded"/>

  <xsl:template match="/">
    <rss version="2.0"
        xmlns:content="http://purl.org/rss/1.0/modules/content/"
        xmlns:wfw="http://wellformedweb.org/CommentAPI/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:wp="http://wordpress.org/export/1.0/">
      <xsl:apply-templates />
    </rss>
  </xsl:template>

  <xsl:template match="blog">
    <channel>
      <title><xsl:value-of select="title"/></title>
      <link>http://addsitehere</link>
      <description><xsl:value-of select="sub-title"/></description>
      <pubDate>
        <xsl:call-template name="topubdate">
          <xsl:with-param name="date" select="@date-created" />
        </xsl:call-template>
      </pubDate>
      <generator>DamienG's BlogML to WordPress transform</generator>
      <language>en</language>
      <xsl:apply-templates />
    </channel>
  </xsl:template>

  <xsl:template match="blog/categories/category">
    <wp:category>
      <wp:category_nicename>
        <xsl:value-of select="title"/>
      </wp:category_nicename>
      <wp:category_parent></wp:category_parent>
      <wp:posts_private>0</wp:posts_private>
      <wp:links_private>0</wp:links_private>
      <wp:cat_name>
        <xsl:value-of select="@description"/>
      </wp:cat_name>
    </wp:category>
  </xsl:template>

  <xsl:template match="post">
    <item>
      <title>
        <xsl:value-of select="title"/>
      </title>
      <link>
        <xsl:value-of select="@post-url"/>
      </link>
      <pubDate>
        <xsl:call-template name="topubdate">
          <xsl:with-param name="date" select="@date-created" />
        </xsl:call-template>
      </pubDate>
      <dc:creator>
        <xsl:variable name="authorref" select="authors/author/@ref" />
        <xsl:value-of select="//author[@id=$authorref]/title"/>
      </dc:creator>
      <xsl:apply-templates select="categories" />
      <guid isPermaLink="false">
        <xsl:value-of select="@post-url"/>
      </guid>
      <description></description>
      <content:encoded>
        <xsl:value-of select="content" disable-output-escaping="yes"/>
      </content:encoded>
      <wp:post_id>
        <xsl:value-of select="@id"/>
      </wp:post_id>
      <wp:post_date>
        <xsl:value-of select="translate(@date-modified,'T',' ')"/>
      </wp:post_date>
      <wp:post_date_gmt>
        <xsl:value-of select="translate(@date-modified,'T',' ')"/>
      </wp:post_date_gmt>
      <wp:comment_status>open</wp:comment_status>
      <wp:ping_status>open</wp:ping_status>
      <wp:post_name>
        <xsl:value-of select="post-name"/>
      </wp:post_name>
      <wp:status>
        <xsl:choose>
          <xsl:when test="@approved='true'">publish</xsl:when>
          <xsl:otherwise>draft</xsl:otherwise>
        </xsl:choose>
      </wp:status>
      <wp:post_parent>0</wp:post_parent>
      <wp:menu_order>0</wp:menu_order>
      <wp:post_type>post</wp:post_type>
      <xsl:apply-templates />
    </item>
  </xsl:template>

  <xsl:template match="comment">
    <wp:comment>
      <wp:comment_id>
        <xsl:value-of select="@id"/>
      </wp:comment_id>
      <wp:comment_author>
        <xsl:value-of select="@user-name"/>
      </wp:comment_author>
      <wp:comment_author_email></wp:comment_author_email>
      <wp:comment_author_url>
        <xsl:value-of select="@user-url"/>
      </wp:comment_author_url>
      <wp:comment_author_IP></wp:comment_author_IP>
      <wp:comment_date>
        <xsl:value-of select="translate(@date-created,'T',' ')"/>
      </wp:comment_date>
      <wp:comment_date_gmt>
        <xsl:value-of select="translate(@date-created,'T',' ')"/>
      </wp:comment_date_gmt>
      <wp:comment_content>
        <xsl:value-of select="content"/>
      </wp:comment_content>
      <wp:comment_approved>
        <xsl:choose>
          <xsl:when test="@approved='true'">1</xsl:when>
          <xsl:otherwise>0</xsl:otherwise>
        </xsl:choose>
      </wp:comment_approved>
      <wp:comment_type></wp:comment_type>
      <wp:comment_parent>0</wp:comment_parent>
    </wp:comment>
  </xsl:template>

  <xsl:template match="post/categories/category">
    <category>
      <xsl:variable name="catref" select="@ref" />
      <xsl:value-of select="/blog/categories/category[@id=$catref]/title"/>
    </category>
  </xsl:template>

  <xsl:template name="topubdate">
    <xsl:param name="date" />
    <xsl:value-of select="substring($date,9,2)" />
    <xsl:value-of select="' '" />
    <xsl:call-template name="monthname">
      <xsl:with-param name="month" select="substring($date,6,2)" />
    </xsl:call-template>
    <xsl:value-of select="' '" />
    <xsl:value-of select="substring($date,1,4)" />
    <xsl:value-of select="' '" />
    <xsl:value-of select="substring($date,12,8)" /> +0000
  </xsl:template>

  <xsl:template name="monthname">
    <xsl:param name="month" />
    <xsl:choose>
      <xsl:when test="$month='01'">Jan</xsl:when>
      <xsl:when test="$month='02'">Feb</xsl:when>
      <xsl:when test="$month='03'">Mar</xsl:when>
      <xsl:when test="$month='04'">Apr</xsl:when>
      <xsl:when test="$month='05'">May</xsl:when>
      <xsl:when test="$month='06'">Jun</xsl:when>
      <xsl:when test="$month='07'">Jul</xsl:when>
      <xsl:when test="$month='08'">Aug</xsl:when>
      <xsl:when test="$month='09'">Sep</xsl:when>
      <xsl:when test="$month='10'">Oct</xsl:when>
      <xsl:when test="$month='11'">Nov</xsl:when>
      <xsl:when test="$month='12'">Dec</xsl:when>
    </xsl:choose>
  </xsl:template>
  <xsl:template match="text()" />
</xsl:stylesheet>

In conclusion

I really don’t want to give up email addresses and IP addresses which gives me two options:

  1. Write an ASPX page that rips the content directly out of the subtext tables and formats it as WXR bypassing BlogML
  2. Extend the Subtext export facility to add the missing fields and transform them from there

I’ll let you know where I go from here…

[)amien

2 responses

  1. Avatar for steve

    Given the popularity of blogging and the number of packages out there you wouldn't think this area would still be so unstandardised. I've moved software a couple of times (hence why my blog goes back to late 2003) but I've always converted using direct database scripts.

    I've been very happy with WordPress, all I would say is be sure to install the FCKEditor plugin.

    steve 1 August 2007
  2. Avatar for Rob Conery

    Damien give me a shout about this - I just had to make the migration myself (ONLY because I think I have to move servers and don't want to deal with my blog).

    I created a console app that pulls the data, formats all of the stuff, and pumps out WXR bits. I'll send to you if you want.

    Rob

    Rob Conery 17 August 2007