Utilizing Recurrent Neural Network for topic discovery in short text scenarios

Lu, Heng-Yang; Kang, Ning; Li, Yun; Zhan, Qian-Yi; Xie, Jun-Yuan; Wang, Chong-Jun

doi:10.3233/IDA-183842

Utilizing Recurrent Neural Network for topic discovery in short text scenarios¹

Article type: Research Article

Authors: Lu, Heng-Yang^a | Kang, Ning^a | Li, Yun^a | Zhan, Qian-Yi^b | Xie, Jun-Yuan^a | Wang, Chong-Jun^{a; *}

Affiliations: [a] National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, Jiangsu, China | [b] School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China

Correspondence: [*] Corresponding author: Chong-Jun Wang, National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, Jiangsu 210023, China. Tel.: +86 25 89683163; Fax: +86 25 89683163; E-mail: chjwang@nju.edu.cn.

Note: [1] This article is an extended version of the paper of Lu et al. (2017) presented at the 31st AAAI conference (San Francisco, CA USA, February 4–9, 2017).

Abstract: The volume of short text data increases rapidly these years. Data examples include tweets and online Q&A pairs. It is essential to organize and summarize these data automatically. Topic model is one of the effective approaches, whose application domains include text mining, personalized recommendation and so on. Conventional models like pLSA and LDA are designed for long text data. However, these models may suffer from the sparsity problem brought by lacking words in short text scenarios. Recent studies such as BTM show that using word co-occurrent pairs is effective to relieve the sparsity problem. However, both BTM and extended models ignore the quantifiable relationship between words. From our perspectives, two more related words should occur in the same topic. Based on this idea, we introduce a model named RIBS, which makes use of RNN to learn relationship. By using the learned relationship, we introduce a model named RIBS-Bigrams, which can display topics with bigrams. Through experiments on two open-source and real-world datasets, RIBS achieves better coherence in topic discovery, and RIBS-Bigrams achieves better readability in topic display. In the document characterization task, the document representation of RIBS can lead better purity and entropy in clustering, higher accuracy in classification.

Keywords: Topic model, short text, Recurrent Neural Network, bigrams

DOI: 10.3233/IDA-183842

Journal: Intelligent Data Analysis, vol. 23, no. 2, pp. 259-277, 2019

Published: 4 April 2019

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia