I’ve been running a public web server since 1999, when my employer registered schmonz.com for me as a gag gift. Last week, I learned from Twitterbrausen that in German, “Schmonz” means something akin to “bullshit”. That’s not what my employer had meant by it; I consider nonetheless that my incessant blogging has acquired a fine new patina of significance.

As I recall, when I was first looking for web server software, there was not a wide variety to choose from. Apache was popular and featureful, a safe default choice. As a novice programmer, I was very much taken with the idea of building dynamic sites, and Apache offered many ways to go about that. Done deal.

In the intervening years, my server machine has changed several times, from Macintosh IIci to Mini-ITX box to Mac Mini to Xen Virtual private server. (I’m particularly fond of the present arrangement wherein hardware is someone else’s problem and I continue to have root access.) No matter the system architecture, the OS has always been NetBSD, which remains unobtrusively thrilling, and the web server has always been Apache, which has gradually become more noisome.

Between my own sites and those of friends I’ve hosted, I’ve needed many times to adapt my Apache configuration to accommodate changes in external modules (such as mod_php), to interfaces (such as PHP via FastCGI instead), and within Apache itself (such as basic access control). Each time I forcibly revisited my config, I found myself revisiting my discomfort with its complexity. I never felt sure that I understood exactly, in its entirety, what my Apache installation would and wouldn’t do. And as a result of years of entanglement and unclarity, I never saw a way to give my users full administrative control over their own sites.

I’ve been imagining moving off Apache for a while. But it always seemed like a project, so I never did anything about it. I can’t usually afford to start on something unless I know I’m going to be able to stop soon, and I won’t usually want to stop unless I know how I can easily start next time. That leaves me needing a sequence of small-enough steps in my desired direction. Or, more precisely, two expectations: that at least one such sequence exists, and that I’ll be able to discover one as I go.

Conveniently, I’ve had plenty of professional practice at incremental problem-solving, enough to identify my first few steps and start making progress. Here’s the rest of the sequence, naming the refactorings I’ve found along the way.

Step 1: Extract Virtual Host

I wanted to see what I’d learn by persuading one site to become its own self-contained thing running its own Apache instance. I picked a relatively basic site, told the system Apache to reverse-proxy that virtual host, added just enough configuration to start a site-specific Apache on localhost, verified that as far as I could discern the site worked equally well, and cut over to the new configuration.

Inserting a proxy usually means, at the very least, server logs start reporting requests coming from the proxy’s IP rather than the browser’s. For this to be a refactoring, the system Apache needed to send an X-Forwarded-For header (it automatically does), and the site-specific Apache needed to know to look for it (by enabling the bundled mod_remoteip).

Manually starting an instance of a service usually means the system won’t automatically know how to do the same next time it boots up. For this to be a refactoring, I needed to add an entry to the site owner’s crontab. To validate that the site would continue to be served by its own Apache as well as it’d been served the old way, I rebooted the system. The site stayed up.

Step 2: Extract More Virtual Hosts

Good, because there were 17 more sites to go. Each of them would also be listening on its own non-standard port on localhost. To identify them at a glance in netstat, I added the port to /etc/services. Now I had a pattern worth repeating.

Some sites were more complex than others (PHP, language negotiation, other wrinkles), but I didn’t need to invent their configurations from scratch, merely uncover the tiny portions of the existing giant config that were relevant and copy them over.

Near the end, I couldn’t start new Apache instances without increasing some kernel IPC parameters (kern.ipc.msgmni from 40 to 80, kern.ipc.semmni from 10 to 20). This felt like a small backward step. I hoped to be able to undo it later.

It also might have felt like a small step backward to suddenly have lots more instances of Apache. But it was a large step forward in my understanding.

Step 3: Remove Dependency (on Apache Modules)

En route to that understanding, I was fairly sure I’d reduced the system Apache to a single responsibility: being a reverse HTTP proxy. To validate that it was no longer serving any other purpose, I turned off most LoadModule directives — even the typical and enabled-by-default ones — leaving only those that prevented Apache from running when I tried turning them off.

Step 4: Substitute Apache with Bozohttpd

I’d been hoping to replace Apache with bozohttpd. Now that I had small, explicit per-site configurations, I could try converting one. The site worked, but the logs were missing lots of basic information. I still think this is where I want to go, but since it’s not a refactoring, I can’t go there yet.

Step 5: Substitute Apache with Lighttpd

I tried converting the same site from Apache to lighttpd, which is a little more featureful than bozohttpd. The site worked, and with mod_extforward enabled, its server logs were indistinguishable from Apache’s. I gzipped the now-retired Apache config to prevent it from being used by mistake while keeping it for reference, updated the site’s crontab entry to start Lighttpd instead of Apache, and rebooted. Bingo!

Step 6: Substitute More Apaches with Lighttpd

I converted a bunch more sites. After doing a few, I figured out how to extract shared configuration. Simpler sites have extremely short config files (just a few lines). More complex sites only define what’s unusual about them.

Step 7: Remove Dependency (on Apache PHP FastCGI)

With a few Apache-powered sites left to convert, I was pretty sure none of them was using PHP. To test this hypothesis, I stopped the php-fpm service. After a week, with nothing broken, I uninstalled it.

With only a few Apache-powered sites remaining, could I return kernel IPC parameters to their default values? Yes, all the Lighttpd and Apache sites ran just fine that way.

Step 8: Get Married

Getting married is the opposite of a refactoring. There’s no internal change, but many callers have new expectations.

Step 9: Substitute Remaining Apaches with Lighttpd

I expected three sites to be relatively tricky to convert:

  1. theschleiers.com needed language negotiation to provide English or German content. I didn’t want to futz with it until there was clearly no longer any urgent need for information about the wedding.
  2. agilein3minut.es needed SSL, which I wasn’t sure whether to proxy at all. Turned out to be easy to proxy because it’s the only HTTPS site I host at present, and it looks like it might continue to not be a big deal if and when I host more.
  3. schmonz.com needed fancy URL rewriting for compatibility with the site’s previous incarnation. I assumed it was going to, anyway. I wound up being able to translate most of its Apache mod_rewrite config to Lighttpd’s expressive conditional redirects, and needed hardly any special-snowflake cleverness.

Once they were converted, there were zero remaining Apache-powered sites.

Step 10: Substitute Apache with Pound

A single Apache instance remained: the system one that was nothing but a reverse proxy to a bunch of Lighttpd instances.

Had I known that’d be its only job, I’d have chosen software designed for the purpose. I knew that now, and chose Pound. On a non-standard port, I figured out how to express a few sites’ worth of reverse proxying in Pound’s configuration language, continued until I’d translated everything in the Apache config, stopped Apache, and started Pound.

Step 11: Remove Dependency (on Apache)

Not a single Apache instance remained. To my knowledge, all sites were operating as normal. After a week, I uninstalled Apache, deleted its corresponding Unix user and group, and gzipped all its config files for reference.

Summary

Apache had been serving multiple roles. I brought the number down to zero, then got rid of it. To do that, I…

  • Decoupled Apache (the virtual-host multiplexer) from Apache (the web server)
  • Gave each site its own Apache web server instance
  • Found a suitable replacement web server and converted all instances
  • Found a suitable replacement virtual-host multiplexer and switched to it
  • Turned software off, and left it off for a while, before uninstalling

For human site visitors, all of these steps were genuine refactorings. (Atypical and automated visitors might notice the HTTP header reporting different server software.) For site owners, most of these steps were also genuine refactorings. (In a couple cases, using the shared Lighttpd config required changing the names of log files by a small nonzero amount.)

I replaced one big application with two small ones. Better. Still, could be more better.

Room for improvement

The replacement virtual-host multiplexer (Pound) feels simple, good, and necessary, in the sense that nothing like it is included with the OS. The replacement web server (Lighttpd) feels simpler and better, by far — I understand what it’s doing, my users finally have full administrative control over their own sites, and unlike Apache, this configuration doesn’t require extra system resources — but NetBSD does include a web server, the one I experimented with in Step 4. If bozohttpd did a few more things, then “Replace Lighttpd with Bozohttpd” would be a refactoring, one that could be followed immediately by “Remove Dependency (on Lighttpd)”.

Next steps

I’ve been practicing C. In some kind of cosmic coincidence, next week I’ll be joining a project that’s being developed primarily in C. Hacking on bozohttpd will be good practice. Here’s the incremental sequence of features awaiting my next increment of time and attention, perhaps on tomorrow’s transatlantic flight:

  1. Optionally log to a file (instead of syslog or stderr)
  2. Optionally log more information (say, in Apache’s “combined” format)
  3. Optionally specify a proxy or proxies that can pass an X-Forwarded-For header whose contents we’ll use as the true client source address (for logs, access control decisions, etc.)

Since I believe I’ll be able to stop, I’ll be able to start. It might not be terribly long before I have more progress to share.