Understanding and Enhancing Queries
NSF Award IIS-0328825
2004-2008
Abstract:
World-Wide Web searchers are often frustrated by a lack of relevant
results, or more often, overwhelmed by a result set that is too large
to examine for relevancy. With hundreds of millions of queries
performed each day, query logs provide a new source of knowledge that
we utilize to learn how users search, what people are searching for,
and provide suggestions to future searchers.
Our goals are to learn what people search for on the Web; to provide
query suggestions for uncertain or inexperienced searchers; and, to
offer relevant query terms for search engine optimization of a
website. To these ends, we analyze the bipartite graph of queries and
their results to identify useful query-query and document-document
relationships; cluster queries into topics, using the relationships we
are able to identify as well as more traditional sources of
information; additionally adapt existing information retrieval
techniques to help identify, organize, and track not simply query
popularity, but topic popularity; and, utilize information from query
logs to help find preferred queries that express a similar information
need.
The broader impacts of this work are two-fold. By exploiting the
untapped information present in Web search engine query traces, this
project increases the understanding of how people search on the Web
and for what they are looking. We apply this knowledge to generate
algorithms and tools to support searchers as well as those who want to
be found by those searchers. These tools, as well as datasets
collected or generated, will be made available to the research
community.
Participants:
- Faculty: Brian D.
Davison
- Students:
Na Dai,
Xiaoguang Qi
- Former members:
Shruti Bhandari (MS, 2008),
Vinay Goel (MS, 2006),
Lan Nie (PhD, 2008),
Luke Sliner,
YaoShuang Wang (MS, 2008),
Baoning
Wu (PhD, 2007)
Primary publications:
-
N. Dai and
B. D. Davison.
Topic-Sensitive Evaluation from Query Logs. Under review.
-
X. Qi and
B. D. Davison. (2009)
Web Page Classification: Features and
Algorithms. ACM Computing
Surveys, 41(2).
-
X. Qi and
B. D. Davison. (2008)
Classifiers without Borders: Incorporating Fielded Text from Neighboring Web Pages.
In Proceedings of the 31st Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
643-650, Singapore, July.
-
B. D. Davison,
W. Zhang, and
B. Wu.
(2008).
Connecting P2P to the Web: Lessons from a Prototype Gnutella-WWW Gateway. In
Internet
Research, 18(3):336-356.
-
S. K. Bhandari,
B. D. Davison.
(2007)
Leveraging Search Engine Results
for Query Classification.
Technical Report LU-CSE-07-013, Dept.
of Computer Science and Engineering, Lehigh University.
-
X. Qi and
B. D. Davison.
(2006)
Knowing
a Web Page by the Company It Keeps.
In
Proceedings of the 15th
ACM Conference on Information and Knowledge Management (CIKM),
pages 228-237,
Arlington, VA, November 6-11.
-
B. D. Davison
and W. Zhang. (2005)
Searching
the Web and more --- a juxtaposition of online search traces.
Technical Report LU-CSE-05-005, Dept.
of Computer Science and Engineering, Lehigh University.
-
B. D. Davison. (2004)
The
potential of
the metasearch engine.
In Proceedings of the
Annual Meeting of the American Society for Information Science and
Technology, Volume 41, pages 393-402,
Providence, RI, November 2004.
-
B. D. Davison,
D. G. Deschenes, and D. B. Lewanda. (2003)
Finding
Relevant Website Queries.
Presented at the Twelfth International
World Wide Web Conference, Budapest, Hungary, May.
This research grant supports, in part, a number of projects in the
WUME Lab
of the Computer Science and Engineering Department at Lehigh University.
Additional WUME lab publications may also be of interest.
Please contact the PI for access to query log data sets, including the
1999 and
2001 Excite query logs, the 2001 AltaVista query log, and the 2003-2007
NLANR
IRCACHE query logs. In addition we have available Google query results
for many of the 1999 Excite
queries,
the top-100 result sets of the top 14,000 queries from the 2001 Excite
log sent
to Google, Yahoo, Ask Jeeves, and MSN Search,
and a 2004 crawl of the Swiss Web (20M pages including spam
blacklist).
This material is based upon work supported by the National Science
Foundation under
Grant No. 0328825. Any opinions, findings, and
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the
National Science Foundation.
Last modified: 19 June 2009