Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Hernández-Castañeda, Ángel* | Calvo, Hiram
Affiliations: Instituto Politécnico Nacional, Center for Computing Research CIC-IPN, Mexico City, Mexico
Correspondence: [*] Corresponding author: Ángel Hernández-Castañeda, Instituto Politécnico Nacional, Center for Computing Research CIC-IPN, Av. J.D. Bátiz e/ M.O. de Mendizábal, 07738, Mexico City, Mexico. E-mail: ahernandez_a12@sagitario.cic.ipn.mx.
Abstract: We identify deceptive text by using different kinds of features: A continuous semantic space model based on latent Dirichlet allocation topics (LDA), one-hot representation (OHR), syntactic information from syntactic n-grams (SN), and lexicon-based features using the linguistic inquiry and word count dictionary (LIWC). Several combinations of these features were tested to assess the best source(s) for deceptive text identification. By selecting the appropriate features, we were able to obtain a benchmark-level performance using a Naïve Bayes classifier. We tested on three different available corpora: A corpus consisting of 800 reviews about hotels, a corpus consisting of 600 reviews about controversial topics, and a corpus consisting of 236 book reviews. We found that the merge of both LDA features and OHR yielded the best results, obtaining accuracy above 80% in all tested datasets. Additionally, this combination of features has the advantage that language-specific-resources are not required (e.g. SN, LIWC), compared to other reference works. Additionally, we present an analysis on which features lead to either deceptive or truthful texts, finding that certain words can play different roles (sometimes even opposing ones) depending on the task being evaluated.
Keywords: Deception detection, continuous semantic space model, one-hot representation, linguistic inquiry and word count, syntactic n-grams
DOI: 10.3233/IDA-170882
Journal: Intelligent Data Analysis, vol. 21, no. 3, pp. 679-695, 2017
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl