![]() |
|
| WUME Crawler is WUME Lab's web crawler. It automatically downloads web pages and stores them for academic research use. Frequently Asked Questions:
2. Why are you running WUME Crawler? 3. How can I prevent WUME Crawler from visiting my web site? 4. How often does WUME Crawler visit my web site? 5. Why does WUME Crawler visit invalid links on my web site? 6. I have further questions or concerns, how can I contact you?
Answers:
1. What is a crawler?
A crawler is a computer program that retrieves
web pages and follow the hyperlinks contained in them. It is also known as
robot or spider. 2. Why are you running WUME Crawler? We operate the WUME Crawler to collect web pages for our academic research projects. These projects include work on enhanced web page classification and search engine spam detection. See our projects page for descriptions of some of the projects in our lab. 3. How can I prevent WUME Crawler from visiting my web site?
We suggest using 'robots.txt'.
'robots.txt' is a file stored in the root directory of a web site that
restricts the crawlers from retrieving certain pages of the web site.
In order to prevent WUME Crawler from crawling pages on your web site, include the following two lines
in the 'robots.txt' file. 4. How often does WUME Crawler visit my web site? WUME Crawler is designed not to generate more than one HTTP request to the same site within 60 seconds. If you think WUME Crawler is visiting your web site too often, feel free to contact us. 5. Why does WUME Crawler visit invalid links on my web site? Typically, this situation happens when a broken or outdated link to your web site is discovered by WUME Crawler from another web page. 6. I have further questions or concerns, how can I contact you? Please
email |
|
| Last updated: September 13, 2005 |
|