I decided to do a little playing around earlier this weekend with Python and CGI scripts. Just for something to do, I kludged together a little comment form for this site. It’s not yet operational — I still haven’t figured out how to get ReCaptcha working via a CGI here on the SDF — but it hopefully will show up some day.

Anyway, I ran into a weird issue when trying to write to an “mbox”-format mail spool file using Python. Basically, rather than actually sending email from within my CGI script, I instead just wanted to take the user’s form input and write it to an mbox-style spool file somewhere on the filesystem, for later perusal using an MUA.

In theory, this should be fairly simple. Python comes with a standard library called “mailbox” that’s purpose-built for working with a variety of spool/mailbox file types, and can add messages to them with ease. Unfortunately, I can’t seem to get it to work right; specifically, the message envelope delimiters don’t seem to be getting written correctly.

In an mbox-format spool file, each message is delimited by a string consisting of a newline, the word “From”, and a space. What comes after the word “From” isn’t really that important, but typically it’s the actual ‘From’ address followed by a timestamp. The crucial part in all this is that, with the exception of the very first message in an mbox file, the delimiter line that begins each message must be preceded by a blank line.

In other words, when writing new messages to an mbox file, you need to always start by writing a newline, or else you need to be religious (and check for the presence of) about ending the text of each message with no less than two newline characters, in order to guarantee a blank line at the end. (According to the Qmail docs, the blank line is considered part of the end of the preceding message, rather than part of the ‘From_’ delimiter.)

Supposedly, when you use Python’s mailbox.mboxMessage class in conjunction with mailbox.mbox to create message objects and write them to a file, this should all be handled. However, it doesn’t seem to be working for me.

The code looks something like this (similar lines removed for clarity):

mailmsg = mailbox.mboxMessage()
mailmsg['To'] = 'Kadin'
mailmsg['From'] = formdata['from'].value
# Other headers removed...
mailmsg.set_payload( formdata['message'].value )

mboxfile = mailbox.mbox('/tmp/'+str( datetime.date.today() )+'.mbox',factory=None,create=True)
mboxfile.lock()
mboxfile.add(mailmsg)
mboxfile.unlock

From my reading of the documentation and some similar code samples, this should produce a correctly-formatted mbox file — but it doesn’t. Instead, it produces this:

From MAILER-DAEMON Sun Aug 31 06:48:30 2008
To: Kadin
From: Testuser
Subject: FORMMAIL:Test Subject
Date: Sun Aug 31 02:48:30 2008
Reply-To: test@test.example

Test message would go here.
From MAILER-DAEMON Sun Aug 31 06:48:46 2008
To: Kadin
From: Testuser2
Subject: FORMMAIL:Test Subject 2
Date: Sun Aug 31 02:48:46 2008
Reply-To: test2@test.example

Another message would go here.

Notice that there’s no empty line between the two messages? That means that when the mbox file is parsed by most applications, they don’t see all the messages in the box. Instead, they simply assume that (since there’s no valid delimiters) there’s just one really long message, and display it as such.

While I think I might be able to fix this by just adding a couple of newlines onto the entered text before it gets incorporated into the message object’s payload, that doesn’t seem like how things should have to work. Unless I’m just misunderstanding the mbox format (there are enough varieties of it, so it’s possible), it doesn’t seem like that ought to be required.

Most likely, I’m doing something wrong, but I can’t seem to figure out what … time to throw in the towel and come back to it tomorrow.