Understanding and Enhancing Queries
NSF Award IIS-0328825
Abstract:
World-Wide Web searchers are often frustrated by a lack of relevant
results, or more often, overwhelmed by a result set that is too large
to examine for relevancy. With hundreds of millions of queries
performed each day, query logs provide a new source of knowledge that
we utilize to learn how users search, what people are searching for,
and provide suggestions to future searchers.
Our goals are to learn what people search for on the Web; to provide
query suggestions for uncertain or inexperienced searchers; and, to
offer relevant query terms for search engine optimization of a
website. To these ends, we analyze the bipartite graph of queries and
their results to identify useful query-query and document-document
relationships; cluster queries into topics, using the relationships we
are able to identify as well as more traditional sources of
information; additionally adapt existing information retrieval
techniques to help identify, organize, and track not simply query
popularity, but topic popularity; and, utilize information from query
logs to help find preferred queries that express a similar information
need.
The broader impacts of this work are two-fold. By exploiting the
untapped information present in Web search engine query traces, this
project increases the understanding of how people search on the Web
and for what they are looking. We apply this knowledge to generate
algorithms and tools to support searchers as well as those who want to
be found by those searchers. These tools, as well as datasets
collected or generated, will be made available to the research
community.
Participants:
- Faculty: Brian D.
Davison
- Students:
Na Dai,
Lan Nie,
Xiaoguang Qi
- Former members:
Shruti Bhandari (MS, 2008),
Vinay Goel (MS, 2006),
Luke Sliner,
YaoShuang Wang (MS, 2008),
Baoning
Wu (PhD, 2007)
Primary publications:
-
X. Qi and
B. D. Davison.
Web Page Classification: Features and
Algorithms. Accepted by ACM Computing Surveys.
-
X. Qi and
B. D. Davison. (2008)
Classifiers without Borders: Incorporating Fielded Text from Neighboring Web Pages.
To be published in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, July.
-
B. D. Davison,
W. Zhang, and
B. Wu.
(2008).
Connecting P2P to the Web: Lessons from a Prototype Gnutella-WWW Gateway. In
Internet Research, 18(3), In press.
-
S. K. Bhandari,
B. D. Davison.
(2007)
Leveraging Search Engine Results
for Query Classification.
Technical Report LU-CSE-07-013, Dept.
of Computer Science and Engineering, Lehigh University.
-
X. Qi and
B. D. Davison.
(2006)
Knowing
a Web Page by the Company It Keeps.
In
Proceedings of the 15th
ACM Conference on Information and Knowledge Management (CIKM),
pages 228-237,
Arlington, VA, November 6-11.
-
B. D. Davison
and W. Zhang. (2005)
Searching
the Web and more --- a juxtaposition of online search traces.
Technical Report LU-CSE-05-005, Dept.
of Computer Science and Engineering, Lehigh University.
-
B. D. Davison. (2004)
The
potential of
the metasearch engine.
In Proceedings of the
Annual Meeting of the American Society for Information Science and
Technology, Volume 41, pages 393-402,
Providence, RI, November 2004.
-
B. D. Davison,
D. G. Deschenes, and D. B. Lewanda. (2003)
Finding
Relevant Website Queries.
Presented at the Twelfth International
World Wide Web Conference, Budapest, Hungary, May.
This research grant supports, in part, a number of projects in the
WUME Lab
of the Computer Science and Engineering Department at Lehigh University.
Additional WUME lab publications may also be of interest.
Please contact the PI for access to data sets, including the 1999 and
2001 Excite query logs, the 2001 AltaVista query log, the 2003 NLANR
IRCACHE query logs, and Google query results for many of the 1999 Excite
queries.
This material is based upon work supported by the National Science
Foundation under
Grant No. 0328825. Any opinions, findings, and
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the
National Science Foundation.
Last modified: 22 May 2008