Topic modeling in short-text using non-negative matrix factorization based on deep reinforcement learning

Shahbazi, Zeinab; Byun, Yung-Cheol

doi:10.3233/JIFS-191690

Topic modeling in short-text using non-negative matrix factorization based on deep reinforcement learning

Article type: Research Article

Authors: Shahbazi, Zeinab | Byun, Yung-Cheol^{; *}

Affiliations: Department of Computer Engineering, Jeju National University, Jejusi, Jeju Special Self-Governing Provience, Korea

Correspondence: [*] Corresponding author. Yung-Cheol Byun, Department of Computer Engineering, Jeju National University, Jejusi 63243, Jeju Special Self-Governing Provience, Korea. E-mail: yungcheolbyun@gmail.com.

Abstract: Topic modeling for short texts is a challenging and interesting problem in the machine learning and knowledge discovery domains. Nowadays, millions of documents published on the internet from various sources. Internet websites are full of various topics and information, but there is a lot of similarity between topics, contents, and total quality of sources, which causes data repetition and gives the user the same information. Another issue is data sparsity and ambiguity because the length of the short text is limited, which causes unsatisfactory results and give irrelevant results to end-users. All these mentioned issues in short texts made an interesting topic for researchers to use machine learning and knowledge discovery techniques to discover underlying topics from a massive amount of data. In this paper, we propose a combination of deep reinforcement learning (RL) and semantics-assisted non-negative matrix factorization model to extract meaningful and underlying topics from short document contents. The main objective of this work is to reduce the problem of repetitive information and data sparsity in short texts to help the users to get meaningful and relevant contents. Furthermore, our propose model reviews an issue of the Seq2Seq approach based on the reinforcement learning perspective and provides a combination of reinforcement learning and SeaNMF formulation using the block coordinate descent algorithm. Moreover, we compare different real-world datasets by using numerical calculation and present a couple of state-of-art models to get better performance on short text document topic modeling. Based on experimental results and comparative analysis, our propose model outperforms the state of art techniques in terms of short document topic modeling.

Keywords: Topic modeling, knowledge discovery, short text, non-negative matrix factorization, machine learning

DOI: 10.3233/JIFS-191690

Journal: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 1, pp. 753-770, 2020

Published: 17 July 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia