From Blogger to SubText – Export psuedo BlogML from Blogger

Getting my blog out of Blogger.com and into Subtext was not as easy as I’d hoped…

What is BlogML?

BlogML is an XML format designed to encapsulate a blog, it’s posts, comments and categories. Sounds great for transferring between blogs… Alas while SubText and many other engines support it Blogger.com does not.

A simple category-less BlogML file without comments looks something like this;

<blog root-url="www.damieng.com/blog/" date-created="2006-04-25T01:02:25" xmlns="http://www.blogml.com/2006/01/BlogML" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <title type="text">damieng</title>
  <sub-title type="text">Random musings from Guernsey</sub-title>
  <author name="Damien Guard" email="[email protected]">
  <posts>
    <post id="113889650370084235" date-created="2006-04-20T01:53:00" date-modified="2006-04-20T01:53:00" approved="true" post-url="https://damieng.com/blog/2006/04/hello.html">
    <title type="text">Hello</title>
    <content type="text"><![CDATA[This is a blog post<br />With HTML!]]></content>
  </posts>
  </author>
</blog>

Check out the BlogML standard itself for full details although doing so requires registration.****The first thing to do is to enter Blogger.com and change the settings for your blog. Specifically you want to go to Formatting Settings and enter 999 and Posts next to Show. If you have more than 99 posts you might have problems.

Setting options on Blogger

Set the Timestamp Format to 4/25/2006 10:38:00AM (obviously the date will be different, it’s the format we’re after) and set Enable Float Alignment to No.

Go to the Comments section and set the Comments Timestamp Format to the same.

Changing the Blogger template

By changing the template we can get Blogger.com to output something close to BlogML but not quite there.

Paste the following block into the template area but DO NOT hit save.

<?xml version="1.0" encoding="<$BlogEncoding$>"?>
<blog root-url="damieng.com" xmlns="http://www.blogml.com/2006/01/BlogML" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <title type="text"><$BlogTitle$></title>
  <sub-title type="text"><![CDATA[<$BlogDescription$>]]></sub-title>
  <author name="<$BlogOwnerFullName$>" email="<$BlogOwnerEmail$>" />
  <categories />
  <posts>
    <Blogger>
      <post id="<$BlogItemNumber$>" date-created="<$BlogItemDateTime$>" date-modified="<$BlogItemDateTime$>" approved="true" post-url="<$BlogItemPermalinkURL$>">
      <title type="text"><![CDATA[<$BlogItemTitle$>]]></title>
      <content type="text"><![CDATA[<$BlogItemBody$>]]></content>
      <comments>
        <BlogItemComments>
          <comment id="<$BlogCommentNumber$>" date-created="<$BlogCommentDateTime$>" date-modified="<$BlogCommentDateTime$>" approved="true" user-name="" user-url="">
          <title type="text"></title>
          <content type="text"><![CDATA[<$BlogCommentBody$>]]></content>
          <author><![CDATA[<$BlogCommentAuthor$>]]></author>
          </comment>
        </BlogItemComments>
      </comments>
    </post>
    </Blogger>
  </posts>
</blog>

Now hit the Preview button and wait. Once complete, view source and save that somewhere. Feel free now to cancel the template change.

Patching up the bad output

The output from this template isn’t BlogML yet but it’s not too far off. Cut out the junk before <?xml and and after </blog> to get one step closer.

Now that just leaves us with three problems.

  1. Date/time formats are incorrect for both posts and comments
  2. Comments have no titles
  3. Comment authors are in <author> tags encoded as a CDATA hyper link instead of user-name and user-url attributes of the <comment> tag

These are all limitations of the Blogger template system but with a short XML parser and writer you should be able to fix them up.

[)amien

0 responses