Efficient and Effective Search Services Over Archival Webs

NSF AWARD IIS-0803605

The Web is enormous and in constant flux, causing much content to be lost over time. Historical collections of web content are thus of monumental value in preserving records of significant aspects of modern society. The Internet Archive offers access to hundreds of billions of historical web page snapshots. The scale of such archives, however, presents tremendous challenges to making this content fully searchable. This NSF-funded research effort investigates efficient and effective approaches to store, index, and retrieve web content from large-scale historical archives. In addition, the temporal content and structure of the archives are mined to exploit temporal characteristics that can improve search result ranking. Technological advances from this work will be tested on content from and in collaboration with the Internet Archive and potentially integrated into its infrastructure, enabling new archival search capabilities for the public.

Participants:

Publications:

This research grant supports, in part, research in the WUME Lab of the Computer Science and Engineering Department at Lehigh University and the WEST Lab of the Computer Science and Engineering Department at NYU Poly.

This material is based upon work supported by the National Science Foundation under Grant No. 0803605 (III-COR-Medium: Efficient and Effective Search Services Over Archival Webs). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Last modified: 13 June 2013