Technology / 

07 Dec 2007

Flat-file to XML Log Conversion in Python

As I recently mentioned, I’m a big fan of the “Unified Logging Format” for instant messaging logs. Unfortunately, of the two IM clients I use most – Adium on Mac OS X and CenterIM on Linux – only Adium uses it. CenterIM uses a simple flat-file format, delimited with linefeeds and formfeeds.

Since I’d really like to get all my logs in one place, in the same format, I wrote a little Python script to convert CenterIM’s flat-file format into something approximating ULF as implemented by Adium. It’s not perfect, and I’d suggest that persons with weak constitutions and functional programmers not look at the code, but it does seem to work fairly well on my logs. To get an idea of what it does, this is a snippet of a CenterIM log, showing an incoming message followed by an outgoing reply:

^L
IN
MSG
1190126325
1190126325
hey
^L
OUT
MSG
1190126383
1190126383
how's your day going?

With “^L” representing the ASCII form-feed character. In Adium format / ULF, this might appear as:

<chat account="joeblow" service="AIM" version="0.4">
  <message sender="janedoe" time="2007-12-04T14:47:35-0000">hey</message>
  <message sender="joeblow" time="2007-12-04T15:34:38-0000">how's your day going?</message>
</chat>

The major limitations the converter suffers from are a consequence of the differing structure of CenterIM’s logs and Adium’s. CenterIM stores chats in a single file for each contact, with one record for each message sent or received. ULF/Adium use one file per ‘conversation,’ which is apparently all the messages sent or received in a single window (i.e. when you close the window, a new conversation begins on the next message). CenterIM has no concept of conversations, only messages. This means that when you convert a CenterIM log to Adium’s format, Adium sees it as one long conversation, and it appears this way in Adium’s log viewer.

Also, while Adium and ULF store the account names of the conversation participants in the logs, CenterIM simply marks messages as ‘IN’ or ‘OUT’, requiring you to look at the log’s enclosing directory to get the name of the participant. Currently, my script doesn’t do this: it just expects the sender’s and receiver’s account names as command-line arguments.

The syntax is:
$ python cimconverter.py filename yoursn theirsn service
Where filename is the name of the log file you want to convert (usually “history”), yoursn is your screen or account name, theirsn is the account name of the person you had the conversation with, and service is the name of the IM service (AIM, MSN, etc.).

At some point, I will try to fix it so it can grab more of the parameters (at least theirname and service) from the history file’s path. But for now it’s just the bare minimum. I can’t guarantee that the output actually conforms to the ULF specification, since to my knowledge nothing formal exists; however, it does produce output that Adium’s log viewer processes and displays, and that’s basically the de facto standard at the moment.

This entry was converted from an older version of the site; if desired, it can be viewed in its original format.