Posts tagged with blogml

Importing BlogML into WordPress

I’ve been trying to get my content out of Subtext and into WordPress – a process that shouldn’t be difficult however Subtext only supports the blog-independent BlogML format and whilst WordPress supports a number of import formats BlogML isn’t one of them. For export WordPress only supports it’s own WordPress WXR format although the BlogML guys have an exporter available.

The first idea was to put together an XSL transform to convert BlogML to WXR.

BlogML format

BlogML posts look like this although Subtext fails to populate the views attribute or even a tag for the user-email as at 1.9.3. It also doesn’t include a field for a commenter’s IP addresses. These two limitations mean no Gravatars or Identicons at the other end right now.

<post id="1" date-created="2006-04-24T04:07:00" date-modified="2006-04-25T11:55:00" approved="true"
    post-url="" type="normal" hasexcerpt="false" views="0">
  <title type="text"><![CDATA[This is a test]]></title>
  <content type="text"><![CDATA[Just testing content]]></content>
  <post-name type="text"><![CDATA[ThisIsATest]]></post-name>
    <category ref="1" />
    <author ref="1" />

WXR format

WXR posts are extended RSS items and annoyingly doesn’t have a field for view counts at all.

  <title>This is a test</title>
  <pubDate>Thu, 04 Apr 2006 04:07:00 +0000</pubDate>
  <dc:creator>Damien Guard</dc:creator>
  <guid isPermaLink="false"></guid>
  <content:encoded><![CDATA[Just testing content]]></content:encoded>
  <wp:post_date>2006-04-24 04:07:00</wp:post_date>
  <wp:post_date_gmt>2006-04-24 04:07:00</wp:post_date_gmt>

Convert BlogML to WXR using XSLT

There are a few things to bear in mind when using this transform:

  • Link and guid tags are populated but WordPress seems to ignore them. Will investigate soon!
  • Time-zone conversion does not take place – hand-code +offsets in the XSLT to deal with your zone.
  • Track-backs not yet considered.
  • Default namespace in BlogML is not handled – remove the xmlns=”…” declaration from your BlogML file before transforming.
  • HTML within comments is not supported – when I enabled this WordPress treated the HTML as text.
  • Embedded attachments are not supported.
  • Edit the primary site link at channel/link in the transformed file to match your site – BlogML doesn’t include it.

Multiple authors and categories should work just fine so throw this file and your BlogML export through an XSLT processor and presto, WXR content ready for import.

<xsl:stylesheet version="1.0" xmlns:xsl=""
  <xsl:output method="xml" indent="yes" cdata-section-elements="content:encoded"/>

  <xsl:template match="/">
    <rss version="2.0"
      <xsl:apply-templates />

  <xsl:template match="blog">
      <title><xsl:value-of select="title"/></title>
      <description><xsl:value-of select="sub-title"/></description>
        <xsl:call-template name="topubdate">
          <xsl:with-param name="date" select="@date-created" />
      <generator>DamienG's BlogML to WordPress transform</generator>
      <xsl:apply-templates />

  <xsl:template match="blog/categories/category">
        <xsl:value-of select="title"/>
        <xsl:value-of select="@description"/>

  <xsl:template match="post">
        <xsl:value-of select="title"/>
        <xsl:value-of select="@post-url"/>
        <xsl:call-template name="topubdate">
          <xsl:with-param name="date" select="@date-created" />
        <xsl:variable name="authorref" select="authors/author/@ref" />
        <xsl:value-of select="//author[@id=$authorref]/title"/>
      <xsl:apply-templates select="categories" />
      <guid isPermaLink="false">
        <xsl:value-of select="@post-url"/>
        <xsl:value-of select="content" disable-output-escaping="yes"/>
        <xsl:value-of select="@id"/>
        <xsl:value-of select="translate(@date-modified,'T',' ')"/>
        <xsl:value-of select="translate(@date-modified,'T',' ')"/>
        <xsl:value-of select="post-name"/>
          <xsl:when test="@approved='true'">publish</xsl:when>
      <xsl:apply-templates />

  <xsl:template match="comment">
        <xsl:value-of select="@id"/>
        <xsl:value-of select="@user-name"/>
        <xsl:value-of select="@user-url"/>
        <xsl:value-of select="translate(@date-created,'T',' ')"/>
        <xsl:value-of select="translate(@date-created,'T',' ')"/>
        <xsl:value-of select="content"/>
          <xsl:when test="@approved='true'">1</xsl:when>

  <xsl:template match="post/categories/category">
      <xsl:variable name="catref" select="@ref" />
      <xsl:value-of select="/blog/categories/category[@id=$catref]/title"/>

  <xsl:template name="topubdate">
    <xsl:param name="date" />
    <xsl:value-of select="substring($date,9,2)" />
    <xsl:value-of select="' '" />
    <xsl:call-template name="monthname">
      <xsl:with-param name="month" select="substring($date,6,2)" />
    <xsl:value-of select="' '" />
    <xsl:value-of select="substring($date,1,4)" />
    <xsl:value-of select="' '" />
    <xsl:value-of select="substring($date,12,8)" /> +0000

  <xsl:template name="monthname">
    <xsl:param name="month" />
      <xsl:when test="$month='01'">Jan</xsl:when>
      <xsl:when test="$month='02'">Feb</xsl:when>
      <xsl:when test="$month='03'">Mar</xsl:when>
      <xsl:when test="$month='04'">Apr</xsl:when>
      <xsl:when test="$month='05'">May</xsl:when>
      <xsl:when test="$month='06'">Jun</xsl:when>
      <xsl:when test="$month='07'">Jul</xsl:when>
      <xsl:when test="$month='08'">Aug</xsl:when>
      <xsl:when test="$month='09'">Sep</xsl:when>
      <xsl:when test="$month='10'">Oct</xsl:when>
      <xsl:when test="$month='11'">Nov</xsl:when>
      <xsl:when test="$month='12'">Dec</xsl:when>
  <xsl:template match="text()" />

In conclusion

I really don’t want to give up email addresses and IP addresses which gives me two options:

  1. Write an ASPX page that rips the content directly out of the subtext tables and formats it as WXR bypassing BlogML
  2. Extend the Subtext export facility to add the missing fields and transform them from there

I’ll let you know where I go from here…


From Blogger to SubText – Export psuedo BlogML from Blogger

Getting my blog out of and into Subtext was not as easy as I’d hoped…

What is BlogML?

BlogML is an XML format designed to encapsulate a blog, it’s posts, comments and categories. Sounds great for transferring between blogs… Alas while SubText and many other engines support it does not.

A simple category-less BlogML file without comments looks something like this;

<blog root-url="" date-created="2006-04-25T01:02:25" xmlns="" xmlns:xs="">
  <title type="text">damieng</title>
  <sub-title type="text">Random musings from Guernsey</sub-title>
  <author name="Damien Guard" email="">
    <post id="113889650370084235" date-created="2006-04-20T01:53:00" date-modified="2006-04-20T01:53:00" approved="true" post-url="">
    <title type="text">Hello</title>
    <content type="text"><![CDATA[This is a blog post<br />With HTML!]]></content>

Check out the BlogML standard itself for full details although doing so requires registration.****The first thing to do is to enter and change the settings for your blog. Specifically you want to go to Formatting Settings and enter 999 and Posts next to Show. If you have more than 99 posts you might have problems.

Setting options on Blogger

Set the Timestamp Format to 4/25/2006 10:38:00AM (obviously the date will be different, it’s the format we’re after) and set Enable Float Alignment to No.

Go to the Comments section and set the Comments Timestamp Format to the same.

Changing the Blogger template

By changing the template we can get to output something close to BlogML but not quite there.

Paste the following block into the template area but DO NOT hit save.

<?xml version=”1.0” encoding=”<$BlogEncoding$>”?>

<$BlogTitle$> <![CDATA[<$BlogDescription$>]]> <![CDATA[<$BlogItemTitle$>]]> <![CDATA[<$BlogItemBody$>]]> <![CDATA[<$BlogCommentBody$>]]> <![CDATA[<$BlogCommentAuthor$>]]>

Now hit the Preview button and wait. Once complete, view source and save that somewhere. Feel free now to cancel the template change.

Patching up the bad output

The output from this template isn’t BlogML yet but it’s not too far off. Cut out the junk before <?xml and and after </blog> to get one step closer.

Now that just leaves us with three problems.

  1. Date/time formats are incorrect for both posts and comments
  2. Comments have no titles
  3. Comment authors are in <author> tags encoded as a CDATA hyper link instead of user-name and user-url attributes of the <comment> tag

These are all limitations of the Blogger template system but with a short XML parser and writer you should be able to fix them up.