# $Id: INSTALL,v 1.1 2000/03/28 06:46:15 aqua Exp aqua $ Peachpit installation Prerequsites: - perl5 - GDBM_File (part of standard perl5 install) - Fcntl (likewise) - Web server (documentation is for Apache, but should be straightforward for other servers also) 1. Edit agents.conf to your satisfaction. Comments in that file indicate the require format. You might wish to add new records or remove existing ones, depending on what information you have available on the movements of censorware spiders at the time. The agents.conf file should specify 'noaction' for agents you explicitly don't wish to tarpit at the outset. Following that should come a set of rules constructed with reasonable care to avoid inadvertantly tarpitting a legitimate web sider. Finally you may wish to include the domain names of censorware producers and their upstream ISPs. Your 'noaction' lines should be sufficiently broad to exclude all probable human-operated web browsers. To reiterate: check your agents rules carefully, to avoid harming legitimate web spiders. Unlike normal spam poison, peachpit is not intended to be protected by a robots.txt entry. 2. Edit the egress.html file. This should be an innocuous, meaningless file with minimal load time. It should be constructed to inspire no interest either in a human or a spider -- a simple "under construction" message is one good choice. 3. Build the dictionary file. Use the provided 'makepoisondb' script to accomplish this (the dictionary consists of a lot of GDBM entries keyed to an index number, and a __COUNT__ entry containing the maximum valid index; generate your own if you prefer). ./makepoisondb /usr/dict/words dict.gdbm The dictionary file is not included, as it would substantially increase the total size of the peachpit archive, and is redundant with data already present on most systems. 4. Edit the configuration section at the top of 'tarpit' to suit your preferences. The default settings will attempt to load the agents configuration and dictionary from /usr/local/etc/peachpit/, and keep a log there also (which will grow slowly, but you might wish to use logrotate or similar to trim it back). 5. Copy the 'tarpit' script into a suitable directory in your web or CGI tree. Copy agents.conf and the dict.gdbm file generated in step 3 to the location(s) you selected in step 4 (default /usr/local/etc/peachpit/). These should be readable by the user as whom the CGI will run (typically nobody, www, www-user or similar). Create a blank logfile (default is /usr/local/etc/peachpit/tarpit.log), writeable only by the user running the CGI. Peachpit will exit with an error to stderr if the logfile is a symlink. 6. Adjust your webserver configuration to graft peachpit into your web document tree somewhere. For the Apache webserver, a ScriptAlias works nicely for this, e.g.: ScriptAlias /tastytreat/ /home/httpd/cgi-bin/peachpit/tarpit/ You would then add a link to, say, "/tastytreat/kibbles" from a page on or near /, tucked away somewhere where no humans would be likely to click on the thing (the program won't harm any agents not listed in the configuration, but it might confuse people). As an added devious tactic, you might opt to use Apache 1.3's mod_rewrite to rewrite all URLs requested by censorware spiders to URLs feeding into the tarpit -- sofar as the spider is concerned it receives whatever page it requested, but all requests are actually processed by the tarpit. For example: RewriteEngine on RewriteLogLevel 0 RewriteCond %{HTTP_USER_AGENT} mud.?crawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} (n2h2|birddog) [NC] RewriteCond %{HTTP_USER_AGENT} !^/tastytreat/ RewriteRule .* /tastytreat/delectables [PT] For patterns matching known censorware spiders, see the agents configuration file or do your own research. Check your webserver documentation for more details. Keep in mind that it's important not to use any consistent or predictable pattern when grafting peachpit. If a predictable pattern exists, then the maintainers of the censorware spider can easily instruct it to avoid URLs matching the pattern. 7. Test the installation -- first check that the script is working properly with your choice of web browser, then by manually specifying the User-Agent. You can use a telnet session for this, as in: % telnet my.webserver.org 80 Trying 127.1.2.3... Connected to localhost. Escape character is '^]'. GET /tastytreat/delectables HTTP/1.0 User-Agent: mudcrawler (you'll need two newlines at the end) 8. Optionally, you might wish to adjust the constants used by emit_tarpit() in the tarpit CGI to suit your specific needs. The supplied valies are intended to start with a nice glop and then get slower, the rate of output decreasing gradually until the connection closes or the maximum tarpit time expires (at which point all remaining output will be dumped without delay).