Gillius's Programming

Conversion to Movable Type - Part 2 - Page Import

Using the XMLRPC interfaces I mentioned in part 1, I needed to parse my HTML content, strip it of the original "template", and upload it to my development Movable Type instance. I used Java and XMLRPC for this, Java because I was most familiar and XMLRPC since that is what MT provides.

This post is the second of a three part series, the first part is here.

Upload Interface

The key to uploading pages is through the wp.newPage method, which MT supports and is best defined in the WordPress API documentation. The following is the relevant parts for my project, as well as some MT-specific information I gleaned from the soruce code, because I never could find a good, proper page on MT's site documenting this.

wp.newPage( blogId, userId, pass, item )

item is a struct:

It appears that if dateCreated is in the future that there is code to set the post to be published in the future, but I did not test this.

Other fields I didn't care about that I could set (For others who might care):

The fields that can't be "empty" (such as mt_allow_comments) are set to their defaults if not explicitly specified in the XML-RPC call. There also appears to be some kind of generic mechanism to support mt_* fields, but I'm not sure what that can set, perhaps custom fields?

The permaLink was the tricky part, because I thought that Movable Type folders mapped to categories (especically since they are reported back as such when you get post information). But the category field is read only for blog posts, not pages. By looking at the Perl source code I determined that it only looks at permaLink, and parses the fragments out of it to determine the folder and "basename" (or "Filename" in the web interface). Before I had tried setting property "mt_basename" and this works great if you don't care about folders and want everything in the root.

I didn't try this, but looking at the source code it appears the permaLink parsing only takes affect if entry type is set to "page", which appears only to happen if you call wp.newPage as opposed to metaWeblog.newPost. I could be wrong, though.

I set MT to have "htm" instead of "html" as the end file name to match my existing site. I used htm on the permaLinks too. As a test of curiousity I tried "html" and got the error message "Requested permalink ... is not available for this page".

So in the client, I have to set both the URL prefix ("http://gillius.org/" as well as the suffix ".htm") in the client.

XMLRPCServer bug

The XML-RPC calls filter all of the basename settings for adds or edits, whether through mt_basename, wp_slug, or permaLink, through the _apply_basename function. The basename is checked whether or not it is unique. The problem is that it doesn't check if it is unique within a folder. The web interface seems to handle this. For example folderA/the_page.htm and folderB/the_page.htm are considered "the same" to the XML-RPC interface but unique in the web interface. When the XML-RPC interface encounters an attempt to edit or add the second "the_page.htm", it will rename it based on the content as if you didn't give it a basename at all.

I tried hard to find a way around this, but in the end I had no alternative but to edit the source code for my import.

This even has a problem where if you "wp.getPage" then pass back that exact, unmodified page to "wp.editPage", it actually changes the permaLink and basename, whereas normally it wouldn't. In my opinion, that's a bug if you can't do a "round trip".

I'm still not sure whether this is a "bug" or a feature, but since the web interface lets me do it, I don't see any reason not to:

    if (defined $basename) {
        # Ensure this basename is unique.
        my $entry_class = ref $entry;
        my $basename_uses = $entry_class->exist({
            blog_id  => $entry->blog_id,
            basename => $basename,
            ($entry->id ? ( id => { op => '!=', value => $entry->id } ) : ()),
        });
        if ($basename_uses) {
            $basename = MT::Util::make_unique_basename($entry);
        }

        $entry->basename($basename);
    }

I commented the lines except for the if statement itself and the setter for basename. In MT 5.04, these were lines 180 to 188 in lib/MT/XMLRPCServer.pm

I'm not totally sure the solution to this because I know virtually nothing about Perl. I would guess that one could just add folders to the associative array in the exist call to "narrow" the search, but I'm not sure. Since I was importing my existing, valid, site from files on disk, I knew that there wouldn't be any actual basename collisions, so for the import I commented it out. I decided to uncomment it after the import, to leave the code in a "pristine" state.

This is one of those moments that I'm really glad for open source.

Writing the Code

I may release the importer source code at some point, in some form. I thought of making a "polished" program but I ended up having to be pretty specific to my website and it took enough effort that I wasn't motivated to make a version that is bullet-proof, so my code would cause a lot more questions than answers. When it failed or if I found a special case or file it had trouble handling I had to coax it along. There's no GUI for it which would be a necessity to review what it was going to import (based on your scanning rules) as well as review if it parsed the content properly.

That said, there are a lot of things I can say to help others interested in writing their own. I wrote it in Java but python or groovy would work well.

In python you can use xmlrpclib:

import xmlrpclib
proxy = xmlrpclib.ServerProxy("http://mt.example.com/cgi-bin/mt/mt-xmlrpc.cgi")
proxy.mt.newPage( blogId, username, password, [ "title" : title, "dateCreated" : dateCreated, "permaLink" : permaLink ] )

In groovy you can use groovy-xmlrpc:

import groovy.net.xmlrpc.*
def server = new XMLRPCServerProxy("http://mt.example.com/cgi-bin/mt/mt-xmlrpc.cgi")
server.mt.supportedMethods().join('\n')

I ended up ultimately going with Java because I was more comfortable with it at the time compared to python. Since I wrote that code, I've been seriously learning Groovy and I would have used it had I known at the time.

In Java I used the org.apache.xmlrpc:xmlrpc-client:3.1.3 library, which was easily obtained from Maven. I made interfaces MetaWeblog, MovableType, and WordPress to encapsulate the methods that I cared about, and implemented them in a MovableTypeImpl, which is shown below:

public class MovableTypeImpl implements WordPress, MovableType, MetaWeblog {
    private final XmlRpcClient client;
    private final String username;
    private final String password;
    private int blogId;

    public MovableTypeImpl( URL url, String username, String password, int blogId ) {
        XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();
        config.setServerURL( url );
        client = new XmlRpcClient();
        client.setConfig( config );

        this.username = username;
        this.password = password;
        this.blogId   = 1;
    }

    public String[] supportedMethods() throws XmlRpcException {
        Object[] result = (Object[]) execute( "mt.supportedMethods" );
        return Arrays.copyOf( result, result.length, String[].class );
    }

    public String newPage( Date pageDate, String title,
                           String content, String link ) throws XmlRpcException {
        Map<String, Object> item = new HashMap<String, Object>();
        if ( title != null )
            item.put( "title", title );
        if ( pageDate != null )
            item.put( "dateCreated", pageDate );
        if ( link != null )
            item.put( "permaLink", link );

        item.put( "description", content );

        return execute( "wp.newPage", blogId, username, password, item ).toString();
    }

    public String newMediaObject( String fileName, byte[] fileData ) throws XmlRpcException {
        Map<String, Object> file = new HashMap<String, Object>();
        file.put( "name", fileName );
        file.put( "bits", fileData );

        return execute( "metaWeblog.newMediaObject", blogId, username, password, file ).toString();
    }

    private Object execute( String method, Object... params )
            throws XmlRpcException {
        return client.execute( method, params );
    }
}

For the final conclusion, view the third part covering asset upload.