What’s pymsgauth?

Posting to a mailing list is usually, but not always, as simple as sending an email. I’m on a few lists where an automated “secretary” responds to each post, requiring me to reply before the post is allowed through. This cuts down on both spam and ease of participation.

A fellow participant on these lists wrote pymsgauth to automatically handle these mailing list confirmation notices. With pymsgauth configured on my mail server, I can post to the list and be done. (Opportunities for automation are one big reason why I run my own mail server.)

Why change the code?

pymsgauth was written for Python 1.x, was last updated by its author in 2003, and is unlikely to receive further updates. Thanks to Python’s compatibility strategy, pymsgauth continues to run well under 2.7. But it doesn’t run under Python 3 at all. And 2.7’s days are numbered.

Given the option, I’d rather deal with this well before the clock runs out. And since I recently wrote some code targeting both Python 2.7 and 3, now seemed like a good time for my brain to do the same for pymsgauth.

In order for it to modify the right outbound messages and respond to the right inbound ones, all my email flows through pymsgauth. Since my email is important to me — did I mention going to some effort to run my own server? — pymsgauth’s reliability is important to me.

As usual with legacy code, my goal here was to obtain the desired change in observable behavior with just enough safety, just enough changed code, and just enough learning along the way. No more than necessary.

What needed to change?

Zeroth, I created a git repository and tagged the last release of pymsgauth.

First, I ran 2to3. It made many of the obvious changes, such as exception syntax, explicit calls to list(), and replacing

if mydict.has_key(mykey)
with
if mykey in mydict
(It missed a few things I assumed it’d catch, as I discovered later!)

Then I entered the try-see-fix cycle: try the code under Python 3, see the next error, fix it.

The first errors came from the config file parser. It was subclassing UserDict (part of Python 2’s standard library) and the import UserDict was failing. I guessed this meant UserDict was gone in Python 3 and I’d have to learn enough about its behavior to subclass Python’s dict instead. Luckily, I guessed wrong. UserDict merely moved. I tweaked the imports, the type instance-check syntax, and (almost all) the string method calls, and stopped getting errors from ConfParser.py.

The next batch of errors came from pymsgauth.py itself. It was failing to import rfc822. This time, unluckily, I was right: that module was gone, replaced in Python 3 by something called email.

How to switch from rfc822 to email?

The interfaces were different. Among other things, rfc822 had a file pointer we were using to read a line at a time. email doesn’t.

I found a Python 3 port of rfc822. Maybe I could bundle it with pymsgauth, or add an external dependency. It looked experimental, though. Clearly better to avoid if possible.

So I searched for all instances of rfc822.Message, and all methods that were being called on them, and (under Python 2.7) wrote tests to characterize pymsgauth’s expectations.

Then I extracted an adapter class RFC822Message (right there in the test file) with all the same methods, trivially delegating to rfc822.

Since email is already available under Python 2.7, I replaced import rfc822 with import email and figured out how to make the tests pass with the new delegate.

Tests passed equally well with Python 3, modulo a few warnings (fixed). So I moved my RFC822Message class out of the test file and into pymsgauth.py, where I replaced all three instances of rfc822.Message with RFC822Message.

Not bad.

Done yet?

Nope, on to the next error. Can’t import popen2. I bounced confusedly around the docs and eventually understood the one-line change to replace popen2.Popen3() with subprocess.Popen().

While there, I had a false start. Based on my recent experience developing an 8-bit-clean SMTP proxy, I thought I’d want to open streams as binary and use Python’s b'this is a sequence of bytes'. But that started looking like too much learning and changing the code. pymsgauth had always worked well enough with Unicode as it was. I reverted to the previous behavior, in a way that worked across Python 2.7 and 3, by adding this at the top:

if sys.version_info[0] < 3:
    import codecs
        sys.stdin = codecs.getreader('utf-8')(sys.stdin)

Next: can’t import sha. That one was easy. 2.7 and 3 both have hashlib and it has a sha1().

Before running on my server, I manually tested 3 of the 4 programs: pymsgauth-mail, pymsgauth-filter, and pymsgauth-clean. They ran under both Python 2.7 and 3, with no apparent errors, and with the observable behavior I expected. And I sort of tested pymsgauth-confirm, but to be really sure, it would need to run on my real mail server and do its real work.

How about now?

Almost.

I tried announcing my new patch to one of the mailing lists that issues these confirmation notices. The notice showed up in my inbox, for the first time in a while. This was disconcerting, but ideal: I had failed to announce a patch that was evidently not quite working.

With verbose logging, I saw I’d missed converting a few static string methods to object methods. Fixed.

I tried sending my announcement again and watched the logs. pymsgauth-filter added the magic token to the headers, pymsgauth-confirm found it and auto-replied, and the only message that appeared in my inbox was the one I had sent to the mailing list.

Today’s Legacy Code Lessons

I could have interpreted YAGNI to mean “wait until you’re having a problem”. Maybe Python 2.7 will get a stay of execution, such that other people will have more time to consider fixing it themselves. Or maybe, compared to the other things I need to get done, I can’t prioritize this one right now.

As it happens, I often have free time in the mornings, and mitigating a risk is my favorite use of slack time.

If my goal had been zero change in observable behavior, then the way I accomplished it was terribly wasteful. I could have changed zero code. My goal was zero change to observable behavior later, despite a known source of impending change. (Or to start making other plans well in advance, if my goal had proved prohibitively expensive.)

The Adapter pattern isolates dependencies. I use it pretty eagerly whenever I take on a new dependency. It’s also extremely useful for minimizing the impact of replacing an existing one.

Developing a class directly in the test file comes from TDD As If You Meant It, where we wait for application code to really need our new class before making it available in its own file. This wasn’t TDD — I wasn’t seeking design feedback, one test at a time — but I knew what I wanted our application to need from our new class. Flipping from file to file would have slowed me down. So I avoided it.

In the end:

  • I didn’t have to understand much of pymsgauth’s code
  • I didn’t have to change much of it, either
  • I got fairly safely and cheaply where I needed to go

Here’s my patch to pymsgauth.

Until next time, I hereby declare Legacy Code Success!