Accurate and efficient query clustering via top ranked search results

Hong, Yuan; Vaidya, Jaideep; Lu, Haibing; Liu, Wen Ming

doi:10.3233/WEB-160335

Accurate and efficient query clustering via top ranked search results

Article type: Research Article

Authors: Hong, Yuan^{a; *} | Vaidya, Jaideep^b | Lu, Haibing^c | Liu, Wen Ming^d

Affiliations: [a] Department of Information Technology Management, University at Albany, SUNY, USA. E-mail: hong@albany.edu | [b] Department of Management Science and Information Systems, Rutgers University, USA. E-mail: jsvaidya@business.rutgers.edu | [c] Department of Operations and Management Information Systems, Santa Clara University, USA. E-mail: hlu@scu.edu | [d] Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada. E-mail: l_wenmin@ciise.concordia.ca

Correspondence: [*] Corresponding author. E-mail: hong@albany.edu.

Abstract: To make the search engine more user-friendly, commercial search engines commonly develop applications to provide suggestion or recommendation for every posed query. Clustering semantically similar queries acts as an essential prerequisite to function well in those applications. However, clustering queries effectively is quite challenging, since they are usually short, incomplete and ambiguous. Existing prevalent clustering methods, such as K-Means or DBSCAN cannot guarantee good performance in such a highly dimensional environment. Through analyzing users’ click-through query logs, hierarchical agglomerative clustering gives good results but is computationally quite expensive. This paper identifies a novel feature for clustering search queries based on a key insight – queries’ top ranked search results can themselves be used to quantify query similarity. After investigating such feature, we propose a new similarity metric for comparing those diverse queries. This facilitates us to develop two very efficient and accurate algorithms integrated in query clustering. We conduct comprehensive experiments to compare the accuracy of our approach against the known baselines along two dimensions: 1) quantifying the cohesion/separation of clustered queries, and 2) justifying the results by real-world Internet users. The experimental results demonstrate that our two algorithms and the similarity metric can generate more accurate results within a significantly shorter time.

Keywords: Web search, query log, clustering, top-k search results, clustering validation

DOI: 10.3233/WEB-160335

Journal: Web Intelligence, vol. 14, no. 2, pp. 119-138, 2016

Published: 25 April 2016

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia