Importing BlogML into WordPress
I’ve been trying to get my content out of Subtext and into WordPress – a process that shouldn’t be difficult however Subtext only supports the blog-independent BlogML format and whilst WordPress supports a number of import formats BlogML isn’t one of them. For export WordPress only supports it’s own WordPress WXR format although the BlogML guys have an exporter available.
The first idea was to put together an XSL transform to convert BlogML to WXR.
BlogML format
BlogML posts look like this although Subtext fails to populate the views attribute or even a tag for the user-email as at 1.9.3. It also doesn’t include a field for a commenter’s IP addresses. These two limitations mean no Gravatars or Identicons at the other end right now.
<post id="1" date-created="2006-04-24T04:07:00" date-modified="2006-04-25T11:55:00" approved="true"
post-url="https://damieng.com/blog/archive/2008/01/30/Test.aspx" type="normal" hasexcerpt="false" views="0">
<title type="text"><![CDATA[This is a test]]></title>
<content type="text"><![CDATA[Just testing content]]></content>
<post-name type="text"><![CDATA[ThisIsATest]]></post-name>
<categories>
<category ref="1" />
</categories>
<authors>
<author ref="1" />
</authors>
</post>
WXR format
WXR posts are extended RSS items and annoyingly doesn’t have a field for view counts at all.
<item>
<title>This is a test</title>
<link>https://damieng.com/blog/archive/2008/01/30/Test.aspx</link>
<pubDate>Thu, 04 Apr 2006 04:07:00 +0000</pubDate>
<dc:creator>Damien Guard</dc:creator>
<guid isPermaLink="false">https://damieng.com/blog/archive/2008/01/30/Test.aspx</guid>
<description></description>
<content:encoded><![CDATA[Just testing content]]></content:encoded>
<wp:post_id>1</wp:post_id>
<wp:post_date>2006-04-24 04:07:00</wp:post_date>
<wp:post_date_gmt>2006-04-24 04:07:00</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>about</wp:post_name>
<wp:status>publish</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:menu_order>0</wp:menu_order>
<wp:post_type>post</wp:post_type>
</item>
Convert BlogML to WXR using XSLT
There are a few things to bear in mind when using this transform:
- Link and guid tags are populated but WordPress seems to ignore them. Will investigate soon!
- Time-zone conversion does not take place – hand-code +offsets in the XSLT to deal with your zone.
- Track-backs not yet considered.
- Default namespace in BlogML is not handled – remove the
xmlns=”…”
declaration from your BlogML file before transforming. - HTML within comments is not supported – when I enabled this WordPress treated the HTML as text.
- Embedded attachments are not supported.
- Edit the primary site link at channel/link in the transformed file to match your site – BlogML doesn’t include it.
Multiple authors and categories should work just fine so throw this file and your BlogML export through an XSLT processor and presto, WXR content ready for import.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.0/"
xmlns:content="http://purl.org/rss/1.0/modules/content/">
<xsl:output method="xml" indent="yes" cdata-section-elements="content:encoded"/>
<xsl:template match="/">
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.0/">
<xsl:apply-templates />
</rss>
</xsl:template>
<xsl:template match="blog">
<channel>
<title><xsl:value-of select="title"/></title>
<link>http://addsitehere</link>
<description><xsl:value-of select="sub-title"/></description>
<pubDate>
<xsl:call-template name="topubdate">
<xsl:with-param name="date" select="@date-created" />
</xsl:call-template>
</pubDate>
<generator>DamienG's BlogML to WordPress transform</generator>
<language>en</language>
<xsl:apply-templates />
</channel>
</xsl:template>
<xsl:template match="blog/categories/category">
<wp:category>
<wp:category_nicename>
<xsl:value-of select="title"/>
</wp:category_nicename>
<wp:category_parent></wp:category_parent>
<wp:posts_private>0</wp:posts_private>
<wp:links_private>0</wp:links_private>
<wp:cat_name>
<xsl:value-of select="@description"/>
</wp:cat_name>
</wp:category>
</xsl:template>
<xsl:template match="post">
<item>
<title>
<xsl:value-of select="title"/>
</title>
<link>
<xsl:value-of select="@post-url"/>
</link>
<pubDate>
<xsl:call-template name="topubdate">
<xsl:with-param name="date" select="@date-created" />
</xsl:call-template>
</pubDate>
<dc:creator>
<xsl:variable name="authorref" select="authors/author/@ref" />
<xsl:value-of select="//author[@id=$authorref]/title"/>
</dc:creator>
<xsl:apply-templates select="categories" />
<guid isPermaLink="false">
<xsl:value-of select="@post-url"/>
</guid>
<description></description>
<content:encoded>
<xsl:value-of select="content" disable-output-escaping="yes"/>
</content:encoded>
<wp:post_id>
<xsl:value-of select="@id"/>
</wp:post_id>
<wp:post_date>
<xsl:value-of select="translate(@date-modified,'T',' ')"/>
</wp:post_date>
<wp:post_date_gmt>
<xsl:value-of select="translate(@date-modified,'T',' ')"/>
</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>
<xsl:value-of select="post-name"/>
</wp:post_name>
<wp:status>
<xsl:choose>
<xsl:when test="@approved='true'">publish</xsl:when>
<xsl:otherwise>draft</xsl:otherwise>
</xsl:choose>
</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:menu_order>0</wp:menu_order>
<wp:post_type>post</wp:post_type>
<xsl:apply-templates />
</item>
</xsl:template>
<xsl:template match="comment">
<wp:comment>
<wp:comment_id>
<xsl:value-of select="@id"/>
</wp:comment_id>
<wp:comment_author>
<xsl:value-of select="@user-name"/>
</wp:comment_author>
<wp:comment_author_email></wp:comment_author_email>
<wp:comment_author_url>
<xsl:value-of select="@user-url"/>
</wp:comment_author_url>
<wp:comment_author_IP></wp:comment_author_IP>
<wp:comment_date>
<xsl:value-of select="translate(@date-created,'T',' ')"/>
</wp:comment_date>
<wp:comment_date_gmt>
<xsl:value-of select="translate(@date-created,'T',' ')"/>
</wp:comment_date_gmt>
<wp:comment_content>
<xsl:value-of select="content"/>
</wp:comment_content>
<wp:comment_approved>
<xsl:choose>
<xsl:when test="@approved='true'">1</xsl:when>
<xsl:otherwise>0</xsl:otherwise>
</xsl:choose>
</wp:comment_approved>
<wp:comment_type></wp:comment_type>
<wp:comment_parent>0</wp:comment_parent>
</wp:comment>
</xsl:template>
<xsl:template match="post/categories/category">
<category>
<xsl:variable name="catref" select="@ref" />
<xsl:value-of select="/blog/categories/category[@id=$catref]/title"/>
</category>
</xsl:template>
<xsl:template name="topubdate">
<xsl:param name="date" />
<xsl:value-of select="substring($date,9,2)" />
<xsl:value-of select="' '" />
<xsl:call-template name="monthname">
<xsl:with-param name="month" select="substring($date,6,2)" />
</xsl:call-template>
<xsl:value-of select="' '" />
<xsl:value-of select="substring($date,1,4)" />
<xsl:value-of select="' '" />
<xsl:value-of select="substring($date,12,8)" /> +0000
</xsl:template>
<xsl:template name="monthname">
<xsl:param name="month" />
<xsl:choose>
<xsl:when test="$month='01'">Jan</xsl:when>
<xsl:when test="$month='02'">Feb</xsl:when>
<xsl:when test="$month='03'">Mar</xsl:when>
<xsl:when test="$month='04'">Apr</xsl:when>
<xsl:when test="$month='05'">May</xsl:when>
<xsl:when test="$month='06'">Jun</xsl:when>
<xsl:when test="$month='07'">Jul</xsl:when>
<xsl:when test="$month='08'">Aug</xsl:when>
<xsl:when test="$month='09'">Sep</xsl:when>
<xsl:when test="$month='10'">Oct</xsl:when>
<xsl:when test="$month='11'">Nov</xsl:when>
<xsl:when test="$month='12'">Dec</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
In conclusion
I really don’t want to give up email addresses and IP addresses which gives me two options:
- Write an ASPX page that rips the content directly out of the subtext tables and formats it as WXR bypassing BlogML
- Extend the Subtext export facility to add the missing fields and transform them from there
I’ll let you know where I go from here…
[)amien
2 responses