The WUME Laboratory

 

WUME Crawler is WUME Lab's web crawler. It automatically downloads web pages and stores them for academic research use. 


Frequently Asked Questions:


1. What is a crawler?

2. Why are you running WUME Crawler?

3. How can I prevent WUME Crawler from visiting my web site?

4. How often does WUME Crawler visit my web site?

5. Why does WUME Crawler visit invalid links on my web site?

6. I have further questions or concerns, how can I contact you?

 


Answers:


1. What is a crawler?

    A crawler is a computer program that retrieves web pages and follow the hyperlinks contained in them. It is also known as robot or spider.

back to top

2. Why are you running WUME Crawler?

    We operate the WUME Crawler to collect web pages for our academic research projects. These projects include work on enhanced web page classification and search engine spam detection.  See our projects page for descriptions of some of the projects in our lab.

back to top

3. How can I prevent WUME Crawler from visiting my web site?

    We suggest using 'robots.txt'. 'robots.txt' is a file stored in the root directory of a web site that restricts the crawlers from retrieving certain pages of the web site. In order to prevent WUME Crawler from crawling pages on your web site, include the following two lines in the 'robots.txt' file.

User-Agent: wume_crawler
Disallow: /

Further information about The Robot Exclusion Protocol may be obtained from robotstxt.org.  As an alternative, you may contact us to request that WUME Crawler not crawl any web pages from your web site.

back to top

4. How often does WUME Crawler visit my web site?

    WUME Crawler is designed not to generate more than one HTTP request to the same site within 60 seconds. If you think WUME Crawler is visiting your web site too often, feel free to contact us.

back to top

5. Why does WUME Crawler visit invalid links on my web site?

    Typically, this situation happens when a broken or outdated link to your web site is discovered by WUME Crawler from another web page.

back to top

6. I have further questions or concerns, how can I contact you?

    Please email

back to top

 

Last updated: September 13, 2005