Under all-too-frequent attack from "intelligent agents" (a.k.a. "robots", and more recently, "accelerators", and "link checkers") that mindlessly download every link encountered, ultimately trying to access the entire database through the listings links. In most cases, these processes are run by well-intentioned but thoughtless neophytes, ignorant of common sense guidelines.
(Very few of these same robotrunners would ever dream of downloading entire databases via anonymous ftp, but for some reason conceptualise www sites as somehow associated only to small and limited databases. This mentality must change --- large databases such as this one [which has millions of distinct URL's that lead to gigabytes of data] are likely to grow ever more commonly exported via
www.)
Following a proposed standard for robot exclusion, this site has maintained since early 2005 a file /robots.txt that specifies those URL's that are off-limits to robots.
(And this "Robots Beware" page was originally posted March 2004.)
We are not willing to play sitting duck to nonsensical methods of "indexing" information. (Presumably you neither would be terribly thrilled if every aspiring encyclopedia editor were to send a gang of blind 600 lb gorillas to your library, armed with a photocopy machine.) We also have no intention of inconveniencing in any way our many tens of thousands of real users, just because a small handful of misconfigured miscreants -- with neither interest in, nor understanding of, our actual content -- is incapable of abiding by well-posted guidelines.
This server is configured to monitor activity and deny access to sites that violate the above guidelines. Continued rapid-fire requests from any site after access has been denied (i.e. with 403 Access denied HTTP response) will be interpreted as a network attack; and we will respond accordingly --- without hesitation, and without further warning.
(Click to test if your site is in violation of the Rules
here against our site.)
If some specific application requires relaxation of the above guidelines, contact
[email protected] in advance of any attempted download. This system is not responsible for the consequences of automated downloads attempted in violation of the above guidelines.
Bookmarks