Affiliations: [a] Collaborative Innovation Center for 21st Century Maritime Silk Road Studies, Guangdong University of Foreign Studies, Guangzhou, China | [b] School of Informatics, Guangdong University of Foreign Studies, Guangzhou, China | [c] School of Business, Guangdong University of Foreign Studies, Guangzhou, China | [d] Faculty of Built Environment, University of New South Wales, Sydney, Australia
Correspondence:
[*]
Corresponding author. E-mail: yingyinqu2@gmail.com. The University of New South Wales, UNSW Sydney, NSW 2052, Australia.
Abstract: Detecting similar question is a fundamental and essential research problem for constructing similar question dataset for the research of question-answering, short text similarity calculating, and sentence paragraphing. This paper explores the previous assumption about similar question detection and analyzes its existing problem. Afterwards, we propose an automated approach to detecting similar questions based on the calculation of question topical diversity using different ways of topical feature generation methods. The experiment dataset are Yahoo! 4,482,757 questions with answers. The results present that our approach achieves a precision of 74% and a recall of 74% as the best performance compared with baseline methods, demonstrating its effectiveness in similar question group detection.
Keywords: Similar question detection, topical diversity, similarity calculation, question-answering