This is a page intended to provide known robots access to selected pages on the site without crawling to them via normal navigation. Pages other than the following list will get blocked for robots, who are supposed to be reading "robots.txt":
/yggdrasil/Projects/WorkHorse/whtop.jsp
/yggdrasil/Projects/Ragnarok/ragtop.jsp
/yggdrasil/Projects/Trailscraps/index.jsp
/yggdrasil/Projects/BLIMP/top.jsp
/yggdrasil/Projects/index.jsp
/yggdrasil/Articles
/yggdrasil/Bob/articles.jsp
/yggdrasil/Bob/resume.jsp
/yggdrasil/Hiking/personalhikes.jsp
/yggdrasil/Hiking/personalparks.jsp
/yggdrasil/Hiking/index.jsp
/yggdrasil/Stocks/countries.jsp
/yggdrasil/Stocks/indices.jsp
/yggdrasil/Stocks/watchlist.jsp
/yggdrasil/Stocks/stockguest.jsp
/yggdrasil/robotlinks.jsp
/yggdrasil/index.jsp
/yggdrasil/welcome.jsp
If you DO start multiple new browser sessions on your IP without mutiple requests to this site on them, you can also trigger the "robot detection". The detection window is short (currently 15 minutes). You can either wait, or just poke around the site as a robot until there's enough requests from your IP which didn't create new sessions. That will unflag your robot status.
There are some other reasons, but they are unlikely. Not described here to avoid giving more tips on circumventing robot detection to robot developers who don't wish to obey the robots.txt rules, including the crawl delay.
© copyright, 2005-2022, Robert L. McQueer |
|