$Id: README 45 2003-02-01 23:02:24Z aqua $ This is sugarplum, a spam-bot database poisoner utility. The specific usage of a spam poisoner is to provide a spammer's email spider with bad data -- ideally lowering the database's usefulness so much that the database must be reverted, discarded or manually edited. Installation instructions may be found in the INSTALL file. Some background: A web spider (many terms have been used, including 'bot,' 'web scanner,' 'web bot,' etc; I'll use spider or web spider herein) is a program whose job is to wander through web pages, either in search of some specific data or a general set of data. Spiders are how search engines are built, and are involved in autonomous info-spider agents by various terms. They're good tools, and useful for many things. In particular, they're supposed to obey the Robot Exclusion Standard (RES), which instructs the spider where it may not go, and what it may not do (index, follow links) with a URL. Most any technology has the potential for abuse, however, and a lot of e-mail spammers now employ specialized spiders for the purpose of harvesting email addresses for their various unpleasant purposes. Such spiders work more or less like any other spider, crawling around looking for email addresses to add to their database. Spam spiders, needless to say, ignore the RES spec, and many attempt to appear as innocuous as possible via unusual search patterns, randomly changing User-Agent: settings, etc. While working for a regional ISP in 1998-99, I came across a homemade poisoner called "dauber," hacked up by the head sysadmin, Scott Doty . At the time, dauber simply printed out a page of random words, containing a few email addresses in which the remote spider's IP address was encoded. The addresses were invalid, but spammers who sent mail to those addresses left log entries which could then be decoded to identify what spider had generated the traffic, and in some cases, identified a netblock which would then have their packets routed into oblivion, world without end. A neat trick. Sugarplum is an amalgam of several ideas for opposing spam via interfering with the use of collection spiders. I can only lay a claim to three, both notions that have no doubt occurred to many others also: 1. Poison spam bots by provding them with a randomly generated tree of bad data, usually in great quantity. [credit to Scott Doty as per above, and many others] 2. Encode the harvester's IP address in teergrube (tarpit) addresses, to identify where they came from. [credit to the teergrube FAQ, slashdot.org discussion, dauber, etc.] 3. Amongst the various bad addresses, include an assortment of addresses belonging to known spammers, so that they may spam each other. [credit to soc.subculture.bondage-bdsm, circa 1997] 4. Adjust one's webserver configuration such that no matter what page a spam bot requests, it transparently receives the poison. 5. If a spambot wanders into the poison, identify it as a spambot by noting whether its User-Agent: header value changes in an un-human like fashion. [ n.b.: this functionality has been obviated in v0.9.8 ] 6. Avoid counterdetection (letting the spambot know it's being poisoned) by rendering output in a fashion as close to normal human output as automatically feasible (even repeatable output, if deterministic mode is used). This involves variable HTML syntax and content, extensive randomization, vague attempts at grammar, etc. The primary assumption in this respect is to assume that the author of the spambot is at least as smart as you are -- and that it will notice any tricks obvious enough that you yourself could pick them up. 5. Upon positive identification of a spambot, launch out-of-band protective measures against it, such as adding its IP to a firewall deny-rule, or making point-target denial of service attacks against it. [ n.b.: this has been removed in v0.9.8 ] Historical changes: Sugarplum was written in 1999, according to the observed habits of spammers at the time. These have changed some since. Address harvesting, while still very common, is no longer the most prevalent method of obtaining addresses for bulk mailing (at present dictionary attacks against large hosts seem to be the favorite approach.) Early releases of Sugarplum gained some slight notoriety for including hooks with which it could be configured to launch denial of service attacks against harvesters. While this was arguably workable at the time, as of this writing all major OS vendors have spent enough energy hardening their TCP/IP stacks as to make "ping of death"-style attacks largely unviable. As of v0.9.8, this facility has been removed from sugarplum. It may reappear in some later version if a new class of viable counterattack against a single harvester becomes feasible. For some time it was possible to identify some spambot harvesters by their proclivity for randomizing their user-agent headers, returning a different agent on each HTTP request. This was an obvious error on part of the spambot authors and has not been seen in some time. Sugarplum's facility for recognizing this behavior, which in any case was useful only for confidently launching a counterattack, has been removed. How it works: The mechanisms that make up sugarplum are: A pair of dictionaries, one with words in the local language, the other with the addresses of known spammers, A set of Apache mod_rewrite rules for spambots that identify themselves as such, and maps their requests back into the poison, A CGI to perform the actual poisoning. Etiquette and ethical considerations: The ethical/moral/legal implications of spam are relatively straightforward, but should nonetheless be considered all the way through before making use of sugarplum. I won't go into the various arguments here -- make up your own mind, and see the net-abuse newsgroups and related resources if you need more data. There are legitimate reasons for using address harvesters, though their utility has (indirectly) been destroyed by widespread use of harvesters for abusive purposes. Sugarplum is capable of producing entirely random addresses, some percentage of which will coincide with legitimate addresses, or with legitimate domains having universal "blanket" delivery. Since the addresses are random, the odds of intersection with an address that cannot simply be deactivated without cost are very low, but the possibility still concerns some people. While I don't agree that it's a significant problem, as of v0.9.8 this form of randomization is disabled by default, to try to provide the safest possible default configuration. Assorted random gibberish: The name "sugarplum" is an appealing little irony; most poisons (substances poisonous to humans, anyway) taste bitter. According to the dictionary, a sugarplum is any of a number of small candies or rolled sweetmeats. A sugarplum that involves actual plums is made by carmelizing the fruit syrup and sugar in a pan. The whole point of feeding poison to the unwary is either to make it invisible and indistinguishable from normal food/drink, or else to make it deliberately tantalizing, with its poisonous nature well concealed until after copious consumption. Sugarplum may be freely distributed, modified, etc. under terms of the GNU General Public License (GPL) v2, or at your option, any later version. Devin Carraway $Date: 2002/09/27 11:07:06 $