Robots or Webcrawlers Who Visit Us To Index our Pages
Note: We started compiling this page in 1995, when bandwidth was scarce and robots were less common than they are today. It gradually lost its relevance as the value of Web indexing became apparent, so we have taken down the old material whining about the waste of our (then scarce) resources!
We still welcome most robots, since it means the data on our server is being indexed so people can access it more easily. But we lock out those whose file accesses are repetitive, who do
not conform to the Proposed Standard for Robot Exclusion (SRE) and who are not generating databases accessible
to the public.
The most irritating defect in the robots who visit us is inability to respond to our robot exclusion specification file. We have some documents that should not be indexed because they
contain huge numbers of keywords and names, like our bibliography. Others are duplicates that we keep on the site for some non-Web-related reason. Downloading and indexing them just generates lots of hits from people who are not looking for bicycle helmet info, frustrating the browser and wasting our bandwidth as the server delivers a huge file that somebody does not want! We get around that by changing page titles when we see that type of hit rising. It's a pity the faulty robots can't be taught to stay out of those areas. In addition, a frequent failure that we see in robot visits is the inability to detect relative addressing that uses the standard DOS convention " ..\ " to indicate a parent directory. Inability to handle that form of addressing causes some robots to generate hundreds of error messages in our error log, wasting resources for all and missing many of our pages. There are indeed differences in the effectiveness of various search engines, and you can understand why if their robots can't even find the pages they need to index.
Another robot problem emerged with the development of "personal" robots that download entire sites for people to look through later, and revisit sites automatically to update pages that may or may not ever be looked at by the downloader. What a waste!
If you are looking for a search engine recommendation, we heartily recommend Google. We used to use their browser toolbar, and found it a useful addition to a browser. Unfortunately, it sends data back to Google, so eventually we scrapped it.
This page was last revised on: February 17k 2010.
Contact us.
|