About the Penis-Enlargement Spam Counter

Just some notes about the counter, with which I didn't want to clutter the main page:

This is mostly what gets through the filters. I use a lot of fairly elaborate spam filtration (Spamassassin, Qpsmtpd, and a number of URIBL and DNSBLs), so most if it is either declined at SMTP delivery time or gets tossed into the spamcan. I don't usually include stuff from the spamcan, unless it's particularly humorous (indeed I can go months without looking in there.) Penis enlargement spam is fairly easy to filter, as spam goes -- really, there are only so many ways to express one's desire to provide a fellow human with additionally hulksome man-meat. Because it's easy to detect, relatively little of it ever hits my inbox, so occasionally if I happen to think of it I grab a few from the spamcan and toss them in.

If the subject line doesn't look promising, it doesn't get included. Of course, humans have been getting remarkably good at identifying spam even when its subject lines are totally innocuous. Yay for contextual awareness.

Yes, I realize some of the "inches" added towards the base (root, bottom, whatever) are closer together than the repeating images. Hey, it's supposed to be funny, not mathematically precise or a shining example of pixel-perfect HTML design.

How it works: a couple of hand-rolled perl scripts that extract subject lines from piped-in mail, update the tallies and generate a new page with the apropriate number of "inches" added. The tallies are stored in either a GDBM or BerkeleyDB file depending on which I felt better about at the time. The generated output is static.

A takeoff on the penis enlargement spam counter appeared in Joseph Cohen's 2003 book, The Penis Book -- a charming and frequently funny book about, well, penises. Cohen did manage to contrive some better photo editing and layout than my HTML knockup.

The page is occasionally linked to by bloggers, and was linked to somewhat notably by the UCB Boalt Hall law school's Revolution is Not an AOL Keyword. It has at various times ranked fairly highly in google's rankings for various synonyms for "penis enlargement." Indeed, traffic to the page fluctuates so strongly with this last factor that it's a handy way of knowing when Google has retuned PageRank. The page scores pretty high on a TF-IDF density computation, helping demonstrate what's wrong with blind usage of TF-IDF in a hostile document space. :)

Incidentally, this thing is a good test for a certain stupidity in a few broken web browsers: it contains <img> tags listing the same image URI many, many times. Some badly engineered browsers blindly open a new HTTP connection for each one, and each one requests a copy of an image, duplicates and all, thereby turning a five-URI page load into a hundred or so. This always makes me happy about the quality of software engineering in the world these days.

(What I'm talking about, if you came here via a search engine or something)

(About the, er, author)