Python Conversion Script

I’ve always been the type of person to keep logs of conversations over IM, whether to check back for the code snippit that someone pasted to me, or to check what someone said. With my move to using both Ubuntu and Windows, I started using Pidgin on both systems to keep a similar UI between the two systems. Even though Pidgin still stores its logs in different places on each OS, I can easily go and copy the logs from one place to another.

The same could not be said for my existing MSN logs. They were in some convoluted XML format, and there was no way that Pidgin was going to parse those without a fight.

So I wrote a Python script to convert the XML to plain-text.

$ python converter.py ./nameoflog.xml addressofuser@hotmail.com

#! /usr/bin/env python
import sys
import os
import codecs
from datetime import datetime
from time import *
from xml.dom.minidom import *

fSessions = {}

def parseMessages(msgs):
    for msg in msgs:
        handleSession(msg.getAttribute('SessionID'), msg)

def handleSession(sid, msg):
    sess = []
    dt = msg.getAttribute('DateTime')
    t = datetime.strptime(dt[:-5], "%Y-%m-%dT%H:%M:%S")
    logtime = localtime(mktime(t.timetuple()))
    dst = 'PST'
    if(logtime[8]):
        dst = 'PDT'
    date = t.ctime()
    sess.append(strftime("%Y-%m-%d%H%M%S-0700%Z", logtime)+'.txt')
    sess.append('Conversation with '+sys.argv[2]+' at '+date+' '+dst+' on Account (msn)\r\n')
    if(sid in fSessions):
        sess = fSessions[sid]
    time = msg.getAttribute('Time')
    name = msg.getElementsByTagName('From')[0].getElementsByTagName('User')[0].getAttribute('FriendlyName')
    text = msg.getElementsByTagName('Text')[0].firstChild.data
    line = '('+time+') '+name+': '+text+'\r\n'
    sess.append(line)
    fSessions[sid] = sess

def handleMain(dom):
    msgs = dom.getElementsByTagName('Message')
    parseMessages(msgs)

dom = parse(sys.argv[1])

handleMain(dom)

os.chdir("logs")

os.mkdir(sys.argv[2])

os.chdir(sys.argv[2])

for s in fSessions:
    f = open(fSessions[s][0], 'w')
    f.write(codecs.BOM_UTF8)
    for ln in range(1, len(fSessions[s])):
        f.write(fSessions[s][ln].encode("utf-8"))
    f.close()

Admittedly, the code is probably a nightmare from a technical standpoint, but it’s tried and tested (and just successfully converted a year’s worth of XML logs into Pidgin-friendly plain-text)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s