Refactorings for web hosting

← Agile Coach Camp 2016 | index | New name →

About Amitai

[he/they] Software development coach and speaker. Itinerant programmer. Legacy code wrestler. Agile in 3 Minutes podcaster. Musician. Bad poet (award-winning).

Speaking

Writing

Music

Weekly Piano Miniatures
October 2022-October 2023
Daily Piano Miniatures
May-October 2022
More…

Code

GitHub
for most of my repos
Twitch
for live-coding (and -pianoing)
pkgsrc
package manager
qmail
email server
ikiwiki
(mostly) static site generator
More…

Elsewhere

I’ve been running a public web server since 1999, when my employer registered schmonz.com for me as a gag gift. Last week, I learned from Twitterbrausen that in German, “Schmonz” means something akin to “bullshit”. That’s not what my employer had meant by it; I consider nonetheless that my incessant blogging has acquired a fine new patina of significance.

As I recall, when I was first looking for web server software, there was not a wide variety to choose from. Apache was popular and featureful, a safe default choice. As a novice programmer, I was very much taken with the idea of building dynamic sites, and Apache offered many ways to go about that. Done deal.

In the intervening years, my server machine has changed several times, from Macintosh IIci to Mini-ITX box to Mac Mini to Xen Virtual private server. (I’m particularly fond of the present arrangement wherein hardware is someone else’s problem and I continue to have root access.) No matter the system architecture, the OS has always been NetBSD, which remains unobtrusively thrilling, and the web server has always been Apache, which has gradually become more noisome.

Between my own sites and those of friends I’ve hosted, I’ve needed many times to adapt my Apache configuration to accommodate changes in external modules (such as mod_php), to interfaces (such as PHP via FastCGI instead), and within Apache itself (such as basic access control). Each time I forcibly revisited my config, I found myself revisiting my discomfort with its complexity. I never felt sure that I understood exactly, in its entirety, what my Apache installation would and wouldn’t do. And as a result of years of entanglement and unclarity, I never saw a way to give my users full administrative control over their own sites.

I’ve been imagining moving off Apache for a while. But it always seemed like a project, so I never did anything about it. I can’t usually afford to start on something unless I know I’m going to be able to stop soon, and I won’t usually want to stop unless I know how I can easily start next time. That leaves me needing a sequence of small-enough steps in my desired direction. Or, more precisely, two expectations: that at least one such sequence exists, and that I’ll be able to discover one as I go.

Conveniently, I’ve had plenty of professional practice at incremental problem-solving, enough to identify my first few steps and start making progress. Here’s the rest of the sequence, naming the refactorings I’ve found along the way.

Step 1: Extract Virtual Host

I wanted to see what I’d learn by persuading one site to become its own self-contained thing running its own Apache instance. I picked a relatively basic site, told the system Apache to reverse-proxy that virtual host, added just enough configuration to start a site-specific Apache on localhost, verified that as far as I could discern the site worked equally well, and cut over to the new configuration.

Inserting a proxy usually means, at the very least, server logs start reporting requests coming from the proxy’s IP rather than the browser’s. For this to be a refactoring, the system Apache needed to send an X-Forwarded-For header (it automatically does), and the site-specific Apache needed to know to look for it (by enabling the bundled mod_remoteip).

Manually starting an instance of a service usually means the system won’t automatically know how to do the same next time it boots up. For this to be a refactoring, I needed to add an entry to the site owner’s crontab. To validate that the site would continue to be served by its own Apache as well as it’d been served the old way, I rebooted the system. The site stayed up.

Step 2: Extract More Virtual Hosts

Good, because there were 17 more sites to go. Each of them would also be listening on its own non-standard port on localhost. To identify them at a glance in netstat, I added the port to /etc/services. Now I had a pattern worth repeating.

Some sites were more complex than others (PHP, language negotiation, other wrinkles), but I didn’t need to invent their configurations from scratch, merely uncover the tiny portions of the existing giant config that were relevant and copy them over.

Near the end, I couldn’t start new Apache instances without increasing some kernel IPC parameters (kern.ipc.msgmni from 40 to 80, kern.ipc.semmni from 10 to 20). This felt like a small backward step. I hoped to be able to undo it later.

It also might have felt like a small step backward to suddenly have lots more instances of Apache. But it was a large step forward in my understanding.

Step 3: Remove Dependency (on Apache Modules)

En route to that understanding, I was fairly sure I’d reduced the system Apache to a single responsibility: being a reverse HTTP proxy. To validate that it was no longer serving any other purpose, I turned off most LoadModule directives — even the typical and enabled-by-default ones — leaving only those that prevented Apache from running when I tried turning them off.

Step 4: Substitute Apache with Bozohttpd

I’d been hoping to replace Apache with bozohttpd. Now that I had small, explicit per-site configurations, I could try converting one. The site worked, but the logs were missing lots of basic information. I still think this is where I want to go, but since it’s not a refactoring, I can’t go there yet.

Step 5: Substitute Apache with Lighttpd

I tried converting the same site from Apache to lighttpd, which is a little more featureful than bozohttpd. The site worked, and with mod_extforward enabled, its server logs were indistinguishable from Apache’s. I gzipped the now-retired Apache config to prevent it from being used by mistake while keeping it for reference, updated the site’s crontab entry to start Lighttpd instead of Apache, and rebooted. Bingo!

Step 6: Substitute More Apaches with Lighttpd

I converted a bunch more sites. After doing a few, I figured out how to extract shared configuration. Simpler sites have extremely short config files (just a few lines). More complex sites only define what’s unusual about them.

Step 7: Remove Dependency (on Apache PHP FastCGI)

With a few Apache-powered sites left to convert, I was pretty sure none of them was using PHP. To test this hypothesis, I stopped the php-fpm service. After a week, with nothing broken, I uninstalled it.

With only a few Apache-powered sites remaining, could I return kernel IPC parameters to their default values? Yes, all the Lighttpd and Apache sites ran just fine that way.

Step 8: Get Married

Getting married is the opposite of a refactoring. There’s no internal change, but many callers have new expectations.

Step 9: Substitute Remaining Apaches with Lighttpd

I expected three sites to be relatively tricky to convert:

theschleiers.com needed language negotiation to provide English or German content. I didn’t want to futz with it until there was clearly no longer any urgent need for information about the wedding.
agilein3minut.es needed SSL, which I wasn’t sure whether to proxy at all. Turned out to be easy to proxy because it’s the only HTTPS site I host at present, and it looks like it might continue to not be a big deal if and when I host more.
schmonz.com needed fancy URL rewriting for compatibility with the site’s previous incarnation. I assumed it was going to, anyway. I wound up being able to translate most of its Apache mod_rewrite config to Lighttpd’s expressive conditional redirects, and needed hardly any special-snowflake cleverness.

Once they were converted, there were zero remaining Apache-powered sites.

Step 10: Substitute Apache with Pound

A single Apache instance remained: the system one that was nothing but a reverse proxy to a bunch of Lighttpd instances.

Had I known that’d be its only job, I’d have chosen software designed for the purpose. I knew that now, and chose Pound. On a non-standard port, I figured out how to express a few sites’ worth of reverse proxying in Pound’s configuration language, continued until I’d translated everything in the Apache config, stopped Apache, and started Pound.

Step 11: Remove Dependency (on Apache)

Not a single Apache instance remained. To my knowledge, all sites were operating as normal. After a week, I uninstalled Apache, deleted its corresponding Unix user and group, and gzipped all its config files for reference.

Summary

Apache had been serving multiple roles. I brought the number down to zero, then got rid of it. To do that, I…

Decoupled Apache (the virtual-host multiplexer) from Apache (the web server)
Gave each site its own Apache web server instance
Found a suitable replacement web server and converted all instances
Found a suitable replacement virtual-host multiplexer and switched to it
Turned software off, and left it off for a while, before uninstalling

For human site visitors, all of these steps were genuine refactorings. (Atypical and automated visitors might notice the HTTP header reporting different server software.) For site owners, most of these steps were also genuine refactorings. (In a couple cases, using the shared Lighttpd config required changing the names of log files by a small nonzero amount.)

I replaced one big application with two small ones. Better. Still, could be more better.

Room for improvement

The replacement virtual-host multiplexer (Pound) feels simple, good, and necessary, in the sense that nothing like it is included with the OS. The replacement web server (Lighttpd) feels simpler and better, by far — I understand what it’s doing, my users finally have full administrative control over their own sites, and unlike Apache, this configuration doesn’t require extra system resources — but NetBSD does include a web server, the one I experimented with in Step 4. If bozohttpd did a few more things, then “Replace Lighttpd with Bozohttpd” would be a refactoring, one that could be followed immediately by “Remove Dependency (on Lighttpd)”.

Next steps

I’ve been practicing C. In some kind of cosmic coincidence, next week I’ll be joining a project that’s being developed primarily in C. Hacking on bozohttpd will be good practice. Here’s the incremental sequence of features awaiting my next increment of time and attention, perhaps on tomorrow’s transatlantic flight:

Optionally log to a file (instead of syslog or stderr)
Optionally log more information (say, in Apache’s “combined” format)
Optionally specify a proxy or proxies that can pass an X-Forwarded-For header whose contents we’ll use as the true client source address (for logs, access control decisions, etc.)

Since I believe I’ll be able to stop, I’ll be able to start. It might not be terribly long before I have more progress to share.

RSS Atom

that's a great writeup on untangling web services!

That’s a great writeup on untangling web services!

I just wanted to mention that I’ve found “haproxy” (http://haproxy.com/, and in pkgsrc) to be perhaps the best and simplest to use HTTP proxy server available. It isn’t (as of ) yet compatible with RFC 7239, but I don’t think there are many parsers for the new “Forwarded” header (which is, as with most recent IETF standards, stupidly complex and difficult to parse). We define custom “X-” headers with one datum per header for easy parsing on the destination server (which in our case is custom C code from scratch). I can send you a sample config file if you’re interested.

One particular feature of haproxy I found much easier to use is forwarding for WebSockets. Doing that in Apache was stupidly complex and it barely works.

I too have been searching for a new web server, especially for low-volume sites, and I too have found bozohttpd somewhat limiting. I tried lighttpd, but I got sick of its bugs quite quickly and gave up on it, especially after trying to submit fixes for some security-related bugs and getting unsatisfactory responses. I’ve not yet fully tested nostromo (http://www.nazgul.ch/dev_nostromo.html) but it looks promising.

— Greg A. Woods

Comment by Greg A. Woods — September 12, 2016 at 03:58:08 PM EDT

Remove comment

btw, I got an error when posting, but posting worked....

When I posted my comment I got an error from your web server, but before trying again I refreshed the article and I see my comment.

Comment by Greg A. Woods — September 12, 2016 at 04:00:34 PM EDT

Remove comment

updates

Sorry about the error! I don’t know what it could have been, so if you ever see it again, please catch whatever details you can.

Last night I got around to testing the hypothesis that “it might continue to not be a big deal if and when I host more [HTTPS sites].” For one more site, at least, it’s proven. This site is now HTTPS-only! Here’s the small Pound diff. Zero changes to the site’s Lighttpd config. Feels good.

HAProxy looks very good and had been on my list, along with h2o. If at some point Pound stops meeting my needs, I’ll try it.

For bozohttpd, I’ve made it possible to run the tests in a “no news is good news” configuration. Now I can hack happily in the manner to which I have grown accustomed. :-)

Comment by Amitai Schleier — September 29, 2016 at 07:45:12 AM EDT

Remove comment

Remove Dependency (on mod_magnet and custom Lua code)

When I ported this site from Apache to Lighttpd, some of the configuration could only be expressed using mod_magnet and a Lua script.

Now I’ve got a few months of log data. The Lua script was being called for exactly two redirects of nonzero value (7 hits each, not counting bots). So I’ve turned mod_magnet off, gzipped the script for reference, and added these lines to the site config:

# [Feeds + Non-feeds] rewrite uppercase "tags" that had been Textpattern "categories"
"^/tag/Assignments(.*)" => "/tag/assignments$1",
"^/tag/Music(.*)"       => "/tag/music$1",

Comment by Amitai Schleier — November 15, 2016 at 01:21:26 PM EST

Remove comment

Rename (site URL)

Oh yeah, this site has finally joined the modern era. Not only SSL, but also no more www. in front. Old URLs still work, of course! This required no change to the site’s Lighttpd config, just a simple-enough-to-guess-right change in the host’s Pound config.

BTW, a very handy way to manually check redirects is curl -I -L "http://www.old.url/that/should/get/redirected".

Comment by Amitai Schleier — November 15, 2016 at 01:32:00 PM EST

Remove comment

Rename (attachment URLs)

I was about to share a link to an MP3 from an old Schmonzcast, and was reminded that with ikiwiki I’m free to move podcast attachments from old Textpattern-mandated locations (/file_download/16/foo.mp3) to sensible spots (/year/month/date/post-title/foo.mp3, right next to its show notes). With a few shell pipelines, I moved the files, updated references to them, and generated Lighttpd redirects so old links keep working. Then I shared the nice-looking link to that MP3, and observed the utter absence of a top-level file_download directory in my repo, and was happy. :-D

Comment by Amitai Schleier — November 16, 2016 at 10:45:59 PM EST

Remove comment

Apache returns!!1

After all that, I needed Apache again. Just a little bit.

Comment by Amitai Schleier — June 29, 2017 at 06:21:06 PM EDT

Remove comment

crontabs -> daemontools

When I upgrade my server every week, one of the things I haven’t been doing is to restart all the various site-specific web server instances. Since they started from cron via @reboot entries, I hadn’t given myself a programmatic way to bring the processes down (or back up).

I’m a big step closer, because the crontab entries have been replaced with daemontools. My setup:

The system starts an svscan /var/service as root (from /etc/rc.d)
The services in /var/service are per-user instances of svscan $HOME/service
The services in each user’s $HOME/service correspond to what had been in their crontab

With a small shell script, I can then enumerate all the non-root svscan instances, along with the user-managed services they supervise:

:; sudo lsvscan

/var/service/svscan-schleierdav: up (pid 97) 7528 seconds
  /home/schleierdav/service/apache.photos.theschleiers.com: up (pid 476) 7527 seconds
  /home/schleierdav/service/gallery.photos.theschleiers.com: up (pid 505) 7527 seconds

/var/service/svscan-schmonz: up (pid 105) 7528 seconds
  /home/schmonz/service/agilein3minut.es: up (pid 202) 7527 seconds
  /home/schmonz/service/implemications.com: up (pid 500) 7527 seconds
  /home/schmonz/service/schmonz.com: up (pid 726) 7527 seconds
  /home/schmonz/service/theschleiers.com: up (pid 170) 7527 seconds

/var/service/svscan-shapemywork: up (pid 98) 7528 seconds
  /home/shapemywork/service/shapemywork.com: up (pid 309) 7527 seconds

Now I can probably just add svc -t /home/*/service/* to my weekly upgrade script.

Comment by Amitai Schleier — August 6, 2017 at 12:49:49 AM EDT

Remove comment

Let's Encrypt

After limping through a few manual Let’s Encrypt renewals — sometimes too late — I’ve scripted it with acme-tiny. Each user that wants SSL creates $HOME/.letsencrypt. For each site that wants SSL, they create letsencrypt/{cert,service} subdirectories. A shared lighttpd config fragment handles the Let’s Encrypt challenge URL. letsencrypt/service/run looks like so:

#!/bin/sh

exec 2>&1
while true; do
    letsencrypt_create_or_renew schmonz.com mail.schmonz.com www.schmonz.com
sleep 1200000
done

Most sites provide only one argument to letsencrypt_create_or_renew. This service directory is then symlinked into $HOME/service. Since my system upgrade script runs svc -t /home/*/service/* (as threatened in the previous comment), these run scripts get restarted approximately once a week. If I skip a system rebuild, that’s fine for SSL purposes: letsencrypt_create_or_renew doesn’t bother talking to Let’s Encrypt servers anyway, unless the cert is more than 15 days old. Once a month, a system cronjob restarts all SSL-aware services, thereby reloading any certificates which may have been updated. Since Let’s Encrypt certs last 90 days, this is probably more than enough automation. I’ll check the logs (and cert expiration dates) in a month to make sure.

Comment by Amitai Schleier — March 11, 2018 at 02:06:38 PM EDT

Remove comment

pound -> sniproxy

In September, tired of various annoyances with pound as a reverse proxy, I started sniproxy on non-standard ports (one for plaintext, one for SSL) and began reconfiguring lighttpd sites to be reachable both ways.

Motivations for replacing pound:

It’d periodically eat CPU, needing to be restarted
Each time I added a new site (especially with SSL), I had to follow what felt like too many steps in too careful an order
Because it has to terminate SSL, one of those steps was giving it the site’s cert and private key $/!\$
To listen on IPv6, nearly the entire config would have to be duplicated

Getting the first site running under sniproxy while keeping it running under pound was tricky, because sniproxy by design does a much smaller job. The main trick turned on this bit of (oversimplified) logic: if there’s an X-Forwarded-For request header, we’re running under pound as before; else we’re under sniproxy, and if the URL scheme isn’t already https then we need to redirect.

After a few sites had been reconfigured, I extracted the common lighttpd bits. After a few more, I figured out how to further simplify sniproxy.conf.

The last site to convert was the sole Apache instance, and the fresh tedium of that effort reminded me how much I wanted to get rid of Apache.

Cutover was easy. Once pound was off, the shared lighttpd config became simpler. While I was in there, I tweaked it for better grades from SSL site scanners. A few weeks ago, with sniproxy never once having required my attention, I deleted pound. Here’s my sniproxy.conf.

Since three years ago, when I grudgingly brought back Apache, lighttpd has grown what appears to be a full-featured WebDAV implementation. I eagerly took the liberty of deleting Apache, even though I don’t have lighttpd’s WebDAV working well yet. When this step has become a refactoring, I’ll comment again.

Comment by Amitai Schleier — October 27, 2020 at 06:14:31 AM EDT

Remove comment

WebDAV: Apache -> lighttpd

Deleting Apache again was fun and rewarding all by itself, plus it lent some urgency to reviving my WebDAV service (rarely used though it is) with lighttpd. Steps to success:

Have the lighttpd developers fix a segfault on NetBSD
Ignore the remaining problems for a week
Take a fresh look at my config, notice how it’s not equivalent to what Apache’s had been, and fix that
Ta da!

The mistake was easier to make because photos.theschleiers.com has always been a little clever. It serves the same URLs two different ways: to browsers (by reverse-proxying to Nathan’s photo gallery app), and to WebDAV clients (by not doing that). I still think this cleverness is worth it, especially now that it’s working again.

Here’s my old Apache config and my new lighttpd config.

Comment by Amitai Schleier — November 5, 2020 at 03:02:53 PM EST

Remove comment

Retiring MySQL

A few days ago I upgraded PHP from 7.3 to 7.4 to keep up with pkgsrc’s default. It went fine, probably. As a result, today I tried an in-place upgrade from MySQL 5.7 to MariaDB 10.4. It wasn’t one. Fortunately I had a quick way to be running MySQL again.

This site’s been database-free for years. If my users can switch their stuff to SQLite, I can get out of the database administrator business. As a first pass, I extracted my own databases (whatever they are) from the system MySQL to a standalone instance running as me.

Comment by Amitai Schleier — September 20, 2021 at 04:02:39 PM EDT

Remove comment

What might be next

Switching my mail server from ucspi-ssl to s6-networking has been on my TODO list for a while. I haven’t done it yet, but now that s6-networking supports Server Name Indication, I’m intrigued by the idea of letting it handle TLS and networking, and running a much simpler and smaller webserver like publicfile (or httpfile). I’d need at least CGI support, which it looks like I could add with shttpd.

Comment by Amitai Schleier — October 16, 2021 at 11:17:56 AM EDT

Remove comment

Comments on this page are closed.