Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Zhu, Longxiaa; b | Xu, Huaa; * | Xu, Yunfenga; b | Xiao, Yic | Li, Jiaa | Deng, Junhuia | Sun, Xiaomina | Bai, Xiaolid
Affiliations: [a] State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China | [b] School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, China | [c] Jiangxi Samton Technology Development Co. LTD, Jiangxi 330013, China | [d] Shijiazhuang Preschool Teachers College, Shijiazhuang, Hebei 050228, China
Correspondence: [*] Corresponding author: Hua Xu, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China. Tel.: +86 1062796450; Fax: +86 1062771792; E-mail: xuhua@tsinghua.edu.cn.
Abstract: With the prevalent of short texts, discovering the topics within them has become an important task. Biterm Topic Model (BTM) is more suitable to discover topics on short texts than traditional topic models. However, there are still some challenges that dealing short texts with BTM will always ignore the document-topic semantic information and lack the true intentions of users. In addition, it is a static method and can not manage streaming short texts when a new one arrives immediately. In order to keep document-topic information and get the topic distribution of a new short text at once, we propose a joint model based on online algorithms of Latent Dirichlet Allocation (LDA) and BTM, which combines the merits of both models. Not only does it alleviate the sparsity when addressing short texts with the online algorithm of BTM, namely Incremental Biterm Topic Model (IBTM), but also keeps document-topic information with extended LDA. And considering the differences between English and Chinese text in writing, we use combined words in short texts as key words to extend the length of short texts and keep the true intensions of users. As shown in the experiment results on two real world datasets, our method is better than other baseline methods. In the end, we explain an application of our method in the task of discovering user interest tags.
Keywords: Streaming chinese short text, topic discovery, topic models, online algorithms
DOI: 10.3233/IDA-183836
Journal: Intelligent Data Analysis, vol. 23, no. 3, pp. 681-699, 2019
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl