Journal of Intelligent & Fuzzy Systems - Volume 39, issue 2 - Journals

Show:

results per page

Self-attention for Twitter sentiment analysis in Spanish

Authors: González, José Ángel | Hurtado, Lluís-F. | Pla, Ferran

Article Type: Research Article

Abstract: This paper describes our proposal for Sentiment Analysis in Twitter for the Spanish language. The main characteristics of the system are the use of word embedding specifically trained from tweets in Spanish and the use of self-attention mechanisms that allow to consider sequences without using convolutional nor recurrent layers. These self-attention mechanisms are based on the encoders of the Transformer model. The results obtained on the Task 1 of the TASS 2019 workshop, for all the Spanish variants proposed, support the correctness and adequacy of our proposal.

Keywords: Twitter, sentiment analysis, transformer encoders

DOI: 10.3233/JIFS-179881

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2165-2175, 2020

Price: EUR 27.50

Ranking based multi-label classification for sentiment analysis

Authors: Chen, Dengbo | Rong, Wenge | Zhang, Jianfei | Xiong, Zhang

Article Type: Research Article

Abstract: This paper proposes a sentiment analysis framework based on ranking learning. The framework utilizes BERT model pre-trained on large-scale corpora to extract text features and has two sub-networks for different sentiment analysis tasks. The first sub-network of the framework consists of multiple fully connected layers and intermediate rectified linear units. The main purpose of this sub-network is to learn the presence or absence of various emotions using the extracted text information, and the supervision signal comes from the cross entropy loss function. The other sub-network is a ListNet. Its main purpose is to learn a distribution that approximates the real …distribution of different emotions using the correlation between them. Afterwards the predicted distribution can be used to sort the importance of emotions. The two sub-networks of the framework are trained together and can contribute to each other to avoid the deviation from a single network. The framework proposed in this paper has been tested on multiple datasets and the results have shown the proposed framework’s potential. Show more

Keywords: Sentiment analysis, multi-label classification, ranking

DOI: 10.3233/JIFS-179882

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2177-2188, 2020

Price: EUR 27.50

Psychological attachment style prediction based on short biographies

Authors: Calvo, Hiram | Gutiérrez-Hinojosa, Sandra J. | Rocha-Ramírez, Arturo P. | Moreno-Armendáriz, Marco A.

Article Type: Research Article

Abstract: In this work we experiment with the hypothesis that words subjects use can be used to predict their psychological attachment style (secure, fearful, dismissing, preoccupied) as defined by Bartholomew and Horowitz. In order to verify this hypothesis, we collected a series of autobiographic texts written by a set of 202 participants. Additionally, a psychological instrument (Frías questionnaire) was applied to these same participants to measure their attachment style. We identified characteristic patterns for each style of attachment by means of two approaches: (1) mapping words into a word space model composed of unigrams, bigrams and/or trigrams on which different classifiers …were trained (Naïve Bayes (NB), Bernoulli NB, Multinomial NB, Multilayer Perceptrons); and (2) using a word-embedding based representation and a neural network architecture based on different units (LSTM, Gated Recurrent Units (GRU) and Bilateral GRUs). We obtained the best accuracy of 0.4079 for the first approach by using a Boolean Multinomial NB on unigrams, bigrams and trigrams altogether, and an accuracy of 0.4031 for the second approach using Bilateral GRUs. Show more

Keywords: Psychological attachment, autobiography, text classification, bilateral gated recurrent units, anxiety-avoidance attachment model

DOI: 10.3233/JIFS-179883

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2189-2199, 2020

Price: EUR 27.50

Sentiment analysis in Nepali: Exploring machine learning and lexicon-based approaches

Authors: Piryani, Rajesh | Piryani, Bhawna | Singh, Vivek Kumar | Pinto, David

Article Type: Research Article

Abstract: In recent times, sentiment analysis research has achieved tremendous impetus on English textual data, however, a very less amount of research has been focused on Nepali textual data. This work is focused towards Nepali textual data. We have explored machine learning approaches and proposed a lexicon-based approach using linguistic features and lexical resources to perform sentiment analysis for tweets written in Nepali language. This lexicon-based approach, first pre-process the tweet, locate the opinion-oriented features and then compute the sentiment polarity of tweet. We have investigated both conventional machine learning models (Multinomial Naïve Bayes (NB), Decision Tree, Support Vector Machine (SVM) …and logistic regression) and deep learning models (Convolution Neural Network (CNN), Long Short-Term Memory (LSTM) and CNN-LSTM) for sentiment analysis of Nepali text. These machine learning models and lexicon-based approach have been evaluated on tweet dataset related to Nepal Earthquake 2015 and Nepal blockade 2015. Lexicon based approach has outperformed than conventional machine learning models. Deep learning models have outperformed than conventional machine learning models and lexicon-based approach. We have also created Nepali SentiWordNet and Nepali SenticNet sentiment lexicon from existing English language resources as by-product. Show more

Keywords: Lexicon-based sentiment analysis, Nepali language, Twitter sentiment analysis, Nepali SentiWordNet, Nepali SenticNet, deep learning, sentiment analysis

DOI: 10.3233/JIFS-179884

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2201-2212, 2020

Price: EUR 27.50

Using weighted directed graphs for identification of flow of emotions in poems

Authors: Sreeja, P. S. | Mahalakshmi, G. S.

Article Type: Research Article

Abstract: Poem is a spontaneous flow of emotions. There are several emotion detection systems to identify emotions from speech, gestures, and text (blogs, newspapers, stories and medical reports). Since such systems do not exist for poetry, we take the first step in building a system to recognize emotions in poetry by constructing a benchmark corpus, the PERC (P oem E motion R ecognition C orpus), of poems written by Indian poets in English. In this research a novel graphical method, Poem Emotion Trajectory System (PETS), is proposed to depict the flow of emotion in a poem. PETS is based on the …construction of a weighted directed graph as a means to represent the emotion flow among the verses of a given poem. The weights represent the transition probability among the emotion states considered. The significant advantage is that a dominant path for each emotion category is identified. Emotion flow along verses is analyzed using a graph-based approach. This method, applied to each emotion category, generalizes the emotion flow in each emotion class. This PETS can be applied in poetry therapy and to enhance creative thinking and writing. Show more

Keywords: Poem emotion recognition corpus, emotion recognition, emotion analysis, poem emotion trajectory system, poem emotion trajectory graph, dominant emotion flow trajectory, natural language processing, artificial intelligence

DOI: 10.3233/JIFS-179885

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2213-2227, 2020

Price: EUR 27.50

Ranking concrete and abstract words using Google Books Ngram data

Authors: Ivanov, Vladimir | Solovyev, Valery

Article Type: Research Article

Abstract: Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract …words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup. Show more

Keywords: Concreteness of words, bigrams, dictionary

DOI: 10.3233/JIFS-179886

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2229-2237, 2020

Price: EUR 27.50

Deep fusion of multiple term-similarity measures for biomedical passage retrieval

Authors: Rosso-Mateus, Andrés | Montes-y-Gómez, Manuel | Rosso, Paolo | González, Fabio A.

Article Type: Research Article

Abstract: Passage retrieval is an important stage of question answering systems. Closed domain passage retrieval, e.g. biomedical passage retrieval presents additional challenges such as specialized terminology, more complex and elaborated queries, scarcity in the amount of available data, among others. However, closed domains also offer some advantages such as the availability of specialized structured information sources, e.g. ontologies and thesauri, that could be used to improve retrieval performance. This paper presents a novel approach for biomedical passage retrieval which is able to combine different information sources using a similarity matrix fusion strategy based on convolutional neural network architecture. The method was …evaluated over the standard BioASQ dataset, a dataset specialized on biomedical question answering. The results show that the method is an effective strategy for biomedical passage retrieval able to outperform other state-of-the-art methods in this domain. Show more

Keywords: Biomedical passage retrieval, neural networks, question answering, deep learning

DOI: 10.3233/JIFS-179887

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2239-2248, 2020

Price: EUR 27.50

i HDT++: improving HDT for SPARQL triple pattern resolution

Authors: Hernández-Illera, Antonio | Martínez-Prieto, Miguel A. | Fernández, Javier D. | Fariña, Antonio

Article Type: Research Article

Abstract: RDF self-indexes compress the RDF collection and provide efficient access to the data without a previous decompression (via the so-called SPARQL triple patterns). HDT is one of the reference solutions in this scenario, with several applications to lower the barrier of both publication and consumption of Big Semantic Data. However, the simple design of HDT takes a compromise position between compression effectiveness and retrieval speed. In particular, it supports scan and subject-based queries, but it requires additional indexes to resolve predicate and object-based SPARQL triple patterns. A recent variant, HDT++ , improves HDT compression ratios, but it does not retain …the original HDT retrieval capabilities. In this article, we extend HDT++ with additional indexes to support full SPARQL triple pattern resolution with a lower memory footprint than the original indexed HDT (called HDT-FoQ). Our evaluation shows that the resultant structure, iHDT++ , requires 70 - 85% of the original HDT-FoQ space (and up to 48 - 72% for an HDT Community variant). In addition, iHDT++ shows significant performance improvements (up to one level of magnitude) for most triple pattern queries, being competitive with state-of-the-art RDF self-indexes. Show more

Keywords: HDT, RDF compression, triple pattern resolution, SPARQL, linked data

DOI: 10.3233/JIFS-179888

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2249-2261, 2020

Price: EUR 27.50

Measuring semantic similarity of documents with weighted cosine and fuzzy logic

Authors: Huetle-Figueroa, Juan | Perez-Tellez, Fernando | Pinto, David

Article Type: Research Article

Abstract: Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task …of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores. Show more

Keywords: Semantic similarity, semantic matching, document similarity, cosine enrichment, keyword enrichment

DOI: 10.3233/JIFS-179889

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2263-2278, 2020

Price: EUR 27.50

Cross-dataset email classification

Authors: Morales, Valentin | Gomez, Juan Carlos | Van Amerongen, Saskia

Article Type: Research Article

Abstract: Email is one of the most popular ways of communication. Nevertheless, it is also a potential tool to deceive and fill users with unwanted publicity, which reduces productivity. To alleviate such fact, a common solution has been building machine learning models based on the content of emails to automatically separate emails (spam vs ham). In this work, a study of a set of machine learning models and content-based features for the problem of cross-dataset email classification is presented. This problem consists in training and testing the models using different datasets; considering the fact that the datasets were collected under different …independent setups. This has the purpose of simulating future variable or unpredictable conditions in the emails content distributions as could happen in a real setting, where models are trained using emails from a certain period of time, group of users or accounts, but tested with emails from other users or accounts. Experiments were conducted with the models and features using different datasets and two setups, same-dataset, and cross-dataset, to show the complexity of the later. The performance was evaluated using the Area Under the ROC Curve, a common metric in email classification. The results show interesting insights for the problem. Show more

Keywords: Email classification, data mining, machine learning, cross-dataset classification

DOI: 10.3233/JIFS-179890

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2279-2290, 2020

Price: EUR 27.50

Natural ontologies with elastic matching for elicited knowledge comparison

Authors: Calvo, Hiram | Figueroa-Nazuno, Jesús | Mandujano, Ángel

Article Type: Research Article

Abstract: Natural Ontologies are presented in this work as a useful tool to model the way in which concepts are organized inside the human mind. In order to be compared, ontologies are represented as matrices and an elastic matching technique is used. For this purpose, a distance measure called Modern Fréchet is proposed, which is an approximation to the NP-Complete problem of elastic matching between matrices. An applied case of study is presented in which human knowledge is compared among different groups of people in the Computer Science domain.

Keywords: Natural ontologies, modern fréchet, ontology elicitation, elastic matching, dynamic time warping

DOI: 10.3233/JIFS-179891

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2291-2303, 2020

Price: EUR 27.50

A computational model for speech disorders using problematic phonemes with ontological reasoning

Authors: Vázquez González, Stephanie | Somodevilla García, María

Article Type: Research Article

Abstract: This work presents a method for data gathering to construct a corpus related to speech disorders in children; such corpus will serve as the base to generate some semi-automatic ontologies, in order to become a computational model to support therapists for diagnosis and possible treatment. Speech disorders, phonemes and some additional information are classified using taxonomies obtained from speech disorders specialized literature. Based on the obtained taxonomies, the ontologies, which structure and formalize concepts defined by the main topic authors, are developed. The ontologies are constructed following some parts of classic methodologies and their subsequent validation is made through competency …questions. The development of the model is based on Natural Language Processing (NLP) and Information Retrieval (IR) techniques. Integration of the ontologies is made to be able to make a classification based in problematic phonemes; this is suggested as a complement to the diagnostic tool in the model. Show more

Keywords: Corpus building, ontology, speech disorders, problematic phonemes

DOI: 10.3233/JIFS-179892

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2305-2315, 2020

Price: EUR 27.50

PIANI: An ontology-based platform for outdoor activity recognition and non-intrusive assistance to the elderly

Authors: Gomez-Montalvo, Jorge | Lopez, Melchor | Curi, Fernando | Moo-Mena, Francisco | Menendez, Victor

Article Type: Research Article

Abstract: In this paper, we introduce a Platform for Non-Intrusive Assistance (named PIANI), as an assistance platform for elderly people able to do activities in outdoor environments without strict supervision. PIANI includes an ontology used to characterize outdoor activities of interest (activities to be observed). PIANI also defines a risk level of the activity that an elderly person is currently doing out of his home by comparing such activity to its characterization. In addition, the proposed platform uses the smartphone of the person in order to collect geographic and time information, which is used by PIANI to infer activity risk and …send alert notifications based on semantic knowledge base. An experimental test was developed as a proof of concept about the utilization of PIANI to identify outdoors activities of elderly people, compute a level of risk and finally send non intrusive alert notification to the user. Show more

Keywords: Ambient assisted living, outdoor activity recognition, ontologies

DOI: 10.3233/JIFS-179893

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2317-2329, 2020

Price: EUR 27.50

Author detection: Analyzing tweets by using a Naïve Bayes classifier

Authors: Abascal-Mena, Rocío | López-Ornelas, Erick

Article Type: Research Article

Abstract: In the context of digital social media, where users have multiple ways to obtain information, it is important to have tools to detect the authorship within a corpus supposedly created by a single author. With the tremendous amount of information coming from social networks there is a lot of research concerning author profiling, but there is a lack of research about the authorship identification. In order to detect the author of a group of tweets, a Naïve Bayes classifier is proposed which is an automatic algorithm based on Bayes’ theorem. The main objective is to determine if a particular tweet …was made by a specific user or not, based on its content. The data used correspond to a simple data set, obtained with the Twitter API, composed of four political accounts accompanied by their username and tweet identifier as it is mixed with multiple user tweets. To describe the performance of the classification model and interpret the obtained results, a confusion matrix is used as it contains values like accuracy, sensitivity, specificity, Kappa measure, the positive predictive and negative predictive value. These results show that the prediction model, after several cases of use, have acceptable values against the observed probabilities. Show more

Keywords: Naïve Bayes classifier, authorship detection, social network analysis, Twitter, confusion matrix

DOI: 10.3233/JIFS-179894

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2331-2339, 2020

Price: EUR 27.50

#Brexit: Leave or remain? the role of user’s community and diachronic evolution on stance detection

Authors: Lai, Mirko | Patti, Viviana | Ruffo, Giancarlo | Rosso, Paolo

Article Type: Research Article

Abstract: Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight that social communities may contribute to influence users’ opinion. Furthermore, stance should be studied in a diachronic perspective, since it could help to shed light on users’ opinion shift dynamics that can be recorded during the debate. We analyzed the political discussion in UK about the BREXIT referendum on Twitter, proposing a novel approach and annotation schema for stance detection, with the main aim of investigating the role …of features related to social network community and diachronic stance evolution. Classification experiments show that such features provide very useful clues for detecting stance. Show more

Keywords: Stance detection, Twitter, brexit, NLP, community detection

DOI: 10.3233/JIFS-179895

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2341-2352, 2020

Price: EUR 27.50

A study of deep learning methods for same-genre and cross-genre author profiling

Authors: Ashraf, Muhammad Adnan | Adeel Nawab, Rao Muhammad | Nie, Feiping

Article Type: Research Article

Abstract: The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable …for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender. Show more

Keywords: Author profiling, deep learning, gender identification, ensemble methods, age identification, same-genre author profiling, cross-genre author profiling

DOI: 10.3233/JIFS-179896

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2353-2363, 2020

Price: EUR 27.50

Predicting consumers engagement on Facebook based on what and how companies write

Authors: Rosas-Quezada, Érika S. | Ramírez-de-la-Rosa, Gabriela | Villatoro-Tello, Esaú

Article Type: Research Article

Abstract: Engaged customers are a very import part of current social media marketing. Public figures and brands have to be very careful about what they post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to predict the impact of a given post by accounting for the content, style, and behavioral attributes as well as metadata information. For validating our method we collected Facebook posts from 10 public pages, we performed experiments with almost 14000 …posts and found that the content and the behavioral attributes from posts provide relevant information to our prediction model. Show more

Keywords: Social media branding, impact analysis, data mining, features engineering, natural language processing

DOI: 10.3233/JIFS-179897

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2365-2377, 2020

Price: EUR 27.50

Author profiling on bi-lingual tweets

Authors: Ashraf, Muhammad Adnan | Nawab, Rao Muhammad Adeel | Nie, Feiping

Article Type: Research Article

Abstract: The task of author profiling aims to distinguish the author’s profile traits from a given content. It has got potential applications in marketing, forensic analysis, fake profile detection, etc. In recent years, the usage of bi-lingual text has raised due to the global reach of social media tools as people prefer to use language that expresses their true feelings during online conversations and assessments. It has likewise impacted the use of bi-lingual (English and Roman-Urdu) text in the sub-continent (Pakistan, India, and Bangladesh) over social media. To develop and evaluate methods for bi-lingual author profiling, benchmark corpora are needed. The …majority of previous efforts have focused on developing mono-lingual author profiling corpora for English and other languages. To fulfill this gap, this study aims to explore the problem of author profiling on bi-lingual data and presents a benchmark corpus of bi-lingual (English and Roman-Urdu) tweets. Our proposed corpus contains 339 author profiles and each profile is annotated with six different traits including age, gender, education level, province, language, and political party. As a secondary contribution, a range of deep learning methods, CNN, LSTM, Bi-LSTM, and GRU, are applied and compared on the three different bi-lingual corpora for age and gender identification, including our proposed corpus. Our extensive experimentation showed that the best results for both gender identification task (Accuracy = 0.882, F 1 -Measure = 0.839) and age identification (Accuracy = 0.735, F 1 -Measure = 0.739) are obtained using Bi-LSTM deep learning method. Our proposed bi-lingual tweets corpus is free and publicly available for research purposes. Show more

Keywords: Twitter, author profiling, roman-urdu, deep learning, bi-lingual, gender identification

DOI: 10.3233/JIFS-179898

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2379-2389, 2020

Price: EUR 27.50

Authorship attribution of Spanish poems using n-grams and the web as corpus

Authors: Guzmán-Cabrera, Rafael

Article Type: Research Article

Abstract: In many areas of professional development, the categorization of textual objects is of critical importance. A prominent example is the attribution of authorship, where symbolic information is manipulated using natural language processing techniques. In this context, one of the main limitations is the necessity of a large number of pre-labeled instances for each author that is to be identified. This paper proposes a method based on the use of n-grams of characters and the use of the web to enrich the training sets. The proposed method considers the automatic extraction of the unlabeled examples from the Web and its iterative …integration into the training data set. The evaluation of the proposed approach was done by using a corpus formed by poems corresponding to 5 contemporary Mexican poets. The results presented allow evaluating the impact of the incorporation of new information into the training set, as well as the role played by the selection of classification attributes using information gain. Show more

Keywords: Authorship attribution, self-training, web corpora

DOI: 10.3233/JIFS-179899

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2391-2396, 2020

Price: EUR 27.50

Unsupervised extractive multi-document text summarization using a genetic algorithm

Authors: Neri-Mendoza, Verónica | Ledeneva, Yulia | García-Hernández, René Arnulfo

Article Type: Research Article

Abstract: The task of Extractive Multi-Document Text Summarization (EMDTS) aims at building a short summary with essential information from a collection of documents. In this paper, we propose an EMDTS method using a Genetic Algorithm (GA). The fitness function considering two unsupervised text features: sentence position and coverage. We propose the binary coding representation, selection, crossover, and mutation operators. We test the proposed method on the DUC01 and DUC02 data set, four different tasks (summary lengths 200 and 400 words), for each of the collections of documents (in total, 876 documents) are tested. Besides, we analyze the most frequently used methodologies …to summarization. Moreover, different heuristics such as topline, baseline, baseline-random, and lead baseline are calculated. In the results, the proposed method achieves to improve the state-of-art results. Show more

Keywords: Genetic algorithm, heuristics, unsupervised, extractive multi-document text, summarization

DOI: 10.3233/JIFS-179900

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2397-2408, 2020

Price: EUR 27.50

Extractive summarization using siamese hierarchical transformer encoders

Authors: González, José Ángel | Segarra, Encarna | García-Granada, Fernando | Sanchis, Emilio | Hurtado, Lluís-F.

Article Type: Research Article

Abstract: In this paper, we present an extractive approach to document summarization, the Siamese Hierarchical Transformer Encoders system, that is based on the use of siamese neural networks and the transformer encoders which are extended in a hierarchical way. The system, trained for binary classification, is able to assign attention scores to each sentence in the document. These scores are used to select the most relevant sentences to build the summary. The main novelty of our proposal is the use of self-attention mechanisms at sentence level for document summarization, instead of using only attentions at word level. The experimentation carried out …using the CNN/DailyMail summarization corpus shows promising results in-line with the state-of-the-art. Show more

Keywords: Siamese neural networks, self attention, extractive summarization

DOI: 10.3233/JIFS-179901

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2409-2419, 2020

Price: EUR 27.50

Determining the importance of sentence position for automatic text summarization

Authors: Mendoza, Griselda Areli Matias | Ledeneva, Yulia | García-Hernández, Rene Arnulfo

Article Type: Research Article

Abstract: The methods of Automatic Extractive Summarization (AES) uses the features of the sentences of the original text to extract the most important information that will be considered in summary. It is known that the first sentences of the text are more relevant than the rest of the text (this heuristic is called baseline), so the position of the sentence (in reverse order) is used to determine its relevance, which means that the last sentences have practically no possibility of being selected. In this paper, we present a way to soften the importance of sentences according to the position. The comprehensive …tests were done on one of the best AES methods using the bag of words and n-grams models with the with DUC02 and DUC01 data sets to determine the importance of sentences. Show more

Keywords: Automatic Text Summarization, n-gram Model, bag of words model, slope calculation, genetic algorithm

DOI: 10.3233/JIFS-179902

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2421-2431, 2020

Price: EUR 27.50

A grammar for specifying full-body gestures elicited for abstract tasks

Authors: Céspedes-Hernández, David | González-Calleros, Juan Manuel | Guerrero-García, Josefina | Vanderdonckt, Jean

Article Type: Research Article

Abstract: A gesture elicitation study consists of a popular method for eliciting a sample of end end users to propose gestures for executing functions in a certain context of use, specified by its users and their functions, the device or the platform used, and the physical environment in which they are working. Gestures proposed in such a study needs to be classified and, perhaps, extended in order to feed a gesture recognizer. To support this process, we conducted a full-body gesture elicitation study for executing functions in a smart home environment by domestic end users in front of a camera. Instead …of defining functions opportunistically, we define them based on a taxonomy of abstract tasks. From these elicited gestures, a XML-compliant grammar for specifying resulting gestures is defined, created, and implemented to graphically represent, label, characterize, and formally present such full-body gestures. The formal notation for specifying such gestures is also useful to generate variations of elicited gestures to be applied on-the-fly on gestures in order to allow one-shot learning. Show more

Keywords: Gesture elicitation study, gesture grammar, gesture recognition, gesture user interfaces, engineering interactive computing systems, one-shot learning

DOI: 10.3233/JIFS-179903

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2433-2444, 2020

Price: EUR 27.50

Towards building a Urdu Language Corpus using Common Crawl

Authors: Shafiq, Hafiz Muhammad | Tahir, Bilal | Mehmood, Muhammad Amir

Article Type: Research Article

Abstract: Urdu is the most popular language in Pakistan which is spoken by millions of people across the globe. While English is considered the dominant web content language, characteristics of Urdu language web content are still unknown. In this paper, we study the World-Wide-Web (WWW) by focusing on the content present in the Perso-Arabic script. Leveraging from the Common Crawl Corpus, which is the largest publicly available web content of 2.87 billion documents for the period of December 2016, we examine different aspects of Urdu web content. We use the Compact Language Detector (CLD2) for language detection. We find that the …global WWW population has a share of 0.04% for Urdu web content with respect to document frequency. 70.9% of the top-level Urdu domains consist of . com , . org , and . info . Besides, urdulughat is the most dominating second-level domain. 40% of the domains are hosted in the United States while only 0.33% are hosted within Pakistan. Moreover, 25.68% web-pages have Urdu as primary language and only 11.78% of web-pages are exclusively in Urdu. Our Urdu corpus consists of 1.25 billion total and 18.14 million unique tokens. Furthermore, the corpus follows the Zipf’s law distribution. This Urdu Corpus can be used for text summarization, text classification, and cross-lingual information retrieval. Show more

Keywords: Urdu web corpus, Perso-Arabic script, web content analysis, common crawl corpus

DOI: 10.3233/JIFS-179904

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2445-2455, 2020

Price: EUR 27.50

“Bend the truth”: Benchmark dataset for fake news detection in Urdu language and its evaluation

Article Type: Research Article

Abstract: The paper presents a new corpus for fake news detection in the Urdu language along with the baseline classification and its evaluation. With the escalating use of the Internet worldwide and substantially increasing impact produced by the availability of ambiguous information, the challenge to quickly identify fake news in digital media in various languages becomes more acute. We provide a manually assembled and verified dataset containing 900 news articles, 500 annotated as real and 400, as fake, allowing the investigation of automated fake news detection approaches in Urdu. The news articles in the truthful subset come from legitimate news sources, …and their validity has been manually verified. In the fake subset, the known difficulty of finding fake news was solved by hiring professional journalists native in Urdu who were instructed to intentionally write deceptive news articles. The dataset contains 5 different topics: (i) Business, (ii) Health, (iii) Showbiz, (iv) Sports, and (v) Technology. To establish our Urdu dataset as a benchmark, we performed baseline classification. We crafted a variety of text representation feature sets including word n -grams, character n -grams, functional word n -grams, and their combinations. After applying a variety of feature weighting schemes, we ran a series of classifiers on the train-test split. The results show sizable performance gains by AdaBoost classifier with 0.87 F1Fake and 0.90 F1Real . We provide the results evaluated against different metrics for a convenient comparison of future research. The dataset is publicly available for research purposes. Show more

Keywords: Fake news detection, urdu corpus, language resources, benchmark dataset, classification, machine learning

DOI: 10.3233/JIFS-179905

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2457-2469, 2020

Price: EUR 27.50

Revisiting subject classification in academic databases: A comparison of the classification accuracy of Web of Science, Scopus & Dimensions

Authors: Singh, Prashasti | Piryani, Rajesh | Singh, Vivek Kumar | Pinto, David

Article Type: Research Article

Abstract: Classification of research articles into different subject areas is an extremely important task in bibliometric analysis and information retrieval. There are primarily two kinds of subject classification approaches used in different academic databases: journal-based (aka source-level) and article-based (aka publication-level). The two popular academic databases- Web of Science and Scopus- use journal-based subject classification scheme for articles, which assigns articles into a subject based on the subject category assigned to the journal in which they are published. On the other hand, the recently introduced Dimensions database is the first large academic database that uses article-based subject classification scheme that assigns …the article to a subject category based on its contents. Though the subject classification schemes of Web of Science have been compared in several studies, no research studies have been done on comparison of the article-based and journal-based subject classification systems in different academic databases. This paper aims to compare the accuracy of subject classification system of the three popular academic databases: Web of Science, Scopus and Dimensions through a large-scale user-based study. Results show that the commonly held belief of superiority of article-based subject classification over the journal-based subject classification scheme does not hold at least at the moment, as Web of Science appears to have the most accurate subject classification. Show more

Keywords: Academic databases, research category, subject classification

DOI: 10.3233/JIFS-179906

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2471-2476, 2020

Price: EUR 27.50

Measuring interdisciplinarity of research articles: An analysis of inter-relatedness of different parameters

Authors: Karmakar, Mousumi | Singh, Vivek Kumar | Pinto, David

Article Type: Research Article

Abstract: With evolution of knowledge disciplines and cross fertilization of ideas, research outputs reported as scientific papers are now becoming more and more interdisciplinary. An interdisciplinary research work usually involves ideas and approaches from multiple disciplines of knowledge applied to solve a specific problem. In many cases the interdisciplinary areas eventually emerge as full-fledged disciplines. In the last two decades, several approaches have been proposed to measure the Interdisciplinarity of a scientific article, such as propositions based on authorship, references, set of keywords etc. Among all these approaches, reference-set based approach is most widely used. The diversity of knowledge in the …reference set has been measured with three parameters, namely variety , balance , and disparity . Different studies tried to combine these measures in one way or other to propose an aggregate measure of interdisciplinarity, called integrated diversity . However, there is a lack of understanding on inter-relations between these parameters. This paper tries to look into inter-relatedness between the three parameters by analytical study on an important interdisciplinary research area, Internet of Things (IoT). Research articles in IoT, as obtained from Web of Science for the year 2018 have been analyzed to compute the three measures and understand their inter-relatedness. Results obtained show that variety and balance are negatively correlated, variety and disparity do not show a stable relatedness and balance and disparity are negatively correlated. Further, the integrated diversity measure is negatively correlated with variety and weakly positively correlated with balance and disparity . The results imply that the composite integrated diversity measure may not be a suitably constructed composite measure of interdisciplinarity. Show more

Keywords: Diversity, interdisciplinarity, interdisciplinary research, multidisciplinary research

DOI: 10.3233/JIFS-179907

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2477-2485, 2020

Price: EUR 27.50

Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Authors: Alekseev, Anton | Tutubalina, Elena | Malykh, Valentin | Nikolenko, Sergey

Article Type: Research Article

Abstract: Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) …and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts. Show more

Keywords: Aspect extraction, out-of-domain classification, deep learning, topic models, topic coherence

DOI: 10.3233/JIFS-179908

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2487-2496, 2020

Price: EUR 27.50

Career path level estimation and skill qualification feedback from textual descriptions

Authors: Duchanoy, Carlos A. | Moreno-Armendáriz, Marco A. | Calvo, Hiram | Hernández-Ramos, Víctor E.

Article Type: Research Article

Abstract: LinkedIn is a social medium oriented to professional career handling and networking. In it, users write a textual profile on their experience, and add skill labels in a free format. Users are able to apply for different jobs, but specific feedback on the appropriateness of their application according to their skills is not provided to them. In this work we particularly focus on applicants of the project management branch from information technologies—although the presented methodology could be extended to any area following the same mechanism. Using the information users provide in their profile, it is possible to establish the corresponding …level in a predefined Project Manager career path (PM level). 1500+ experiences and skills from 300 profiles were manually tagged to train and test a model to automatically estimate the PM level. In this proposal we were able to perform such prediction with a precision of 98%. Additionally, the proposed model is able to provide feedback to users by offering a guideline of necessary skills to be learned to fulfill the current PM level, or those needed in order to upgrade to the following PM level. This is achieved through the clustering of skill qualification labels. Results of experiments with several clustering algorithms are provided as part of this work. Show more

Keywords: Project manager career path level, profile classification, skill qualification estimation, natural language processing, word embeddings

DOI: 10.3233/JIFS-179909

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2497-2507, 2020

Price: EUR 27.50

An unsupervised lower-baseline localization method based on writing style features for historical documents

Authors: Ángel García-Calderón, Miguel | García-Hernndez, RenArnulfo | Ledeneva, Yulia

Article Type: Research Article

Abstract: There is a lot of cultural heritage information in historical documents that have not been explored or exploited yet. Lower-Baseline Localization (LBL) is the first step in information retrieval from images of manuscripts where groups of handwritten text lines representing a message are identified. An LBL method is described depending on how the features of the writing style of an author are treated: the character shape and size, gap between characters and between lines, the shape of ascendant and descendant strokes, character body, space between characters, words and columns, and touching and overlapping lines. For example, most of the supervised …LBL methods only analyze the gap between characters as part of the preprocessing phase of the document and the rest of features of the writing style of the author are left for the learning phase of the classifier. For such reason, supervised LBL methods tend to learn particular styles and collections. This paper presents an unsupervised LBL method that explicit analyses all the features of the writing style of the author and processes the document by windows. In this sense, the proposed method is more independent from the writing style of the author, and it is more reliable with new collections in real scenarios. According to the experimentation, the proposed method surpasses the state-of-the-art methods with the standard READ-BAD historical collection with 2,036 manuscripts and 132,124 manually annotated baselines from 9 libraries in 500 years. Show more

Keywords: Lower-baseline localization, historical document analysis, text line segmentation, writing style features

DOI: 10.3233/JIFS-179910

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2509-2520, 2020

Price: EUR 27.50

Solving arithmetic word problems: A deep learning based approach

Authors: Mandal, Sourav | Sekh, Arif Ahmed | Naskar, Sudip Kumar

Article Type: Research Article

Abstract: This paper presents a novel deep learning based approach to solving arithmetic word problems . Solving different types of mathematical (math) word problems (MWP) is a very complex and challenging task as it requires Natural Language Understanding (NLU) and Commonsense knowledge . An application on this can benefit learning (education) technologies such as E-learning systems , Intelligent tutoring , Learning Management Systems (LMS), Innovative teaching/learning , etc. We propose Deep Learning based Arithmetic Word Problem Solver , DLAWPS, an intelligent MWP solver system. DLAWPS consists of a Recurrent Neural Network (RNN) based Bi-directional Long Short-Term Memory …(BiLSTM) to classify operation among four basic operations {+ , - , * , /}, and a knowledge-based irrelevant information removal unit (IIRU) to identify the relevant quantities to form an equation to solve arithmetic MWPs. Our system generates state-of-the-art results on the standard arithmetic word problem datasets –AddSub , SingleOp , and a Combined dataset. Show more

Keywords: Solving arithmetic word problems, solving math word problems, BiLSTM-based operation prediction, irrelevant information removal

DOI: 10.3233/JIFS-179911

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2521-2531, 2020

Price: EUR 27.50

Brain signal classification for creative tasks

Article Type: Research Article

Abstract: We tried to determine if emotive self-feedback from conscious assessment of artists’ own works generates sufficient impetus for accomplishment of goals. Self-reports from participants of an ‘experimental’ group working independently and without external feedback on their work are examined. The performance of this group is compared to ‘control’ performers in tutored sessions (with external feedback). On the whole a two-fold analysis was carried out. First, verbal reports of the participants’ feelings about their work in both experimental and control settings were analyzed. Second, a brainwave analysis of each participant was conducted while they were engaged in the …same tasks so as to examine effects of concentration and energy output. The Hilbert-Huang transform was used to filter data frequency for brainwaves emitted at channels AF4, AF3, F6 and F7, all positioned along the pre-frontal cortex. Results of participants’ brainwave behavior within frequency ranges of 14–16 Hz, as well as for higher ranges (above 60 Hz), do not show significant difference in the two groups. This indicates that brainwave activity is sustained in individuals who depend on self-feedback appraisals, at least within the domain of creativity investigated in this paper. Show more

Keywords: Affect, creativity, external feedback, flow, motivation, self-feedback

DOI: 10.3233/JIFS-179912

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2533-2544, 2020

Price: EUR 27.50

PROMISE: PRoposing an Ontological Model for developing collaboratIve SystEms

Authors: Anzures-García, Mario | Sánchez-Gálvez, Luz A.

Article Type: Research Article

Abstract: Currently, there is a great necessity in the organizations to support communication, collaboration, and coordination — important aspects that characterize collaborative systems—between its workers and enterprises; in order to simplify and improve their production processes. However, the development and maintenance of these systems are very complex. Although several proposals to develop them have been made, they usually lack theoretical models, which allow specifying and creating both group and interactive activities in a conceptual and formal way to sustain the requirements of group work. Therefore, this paper PRoposes an Ontological Model for developing collaboratIve SystEms (PROMISE), it tries to be a …guide for the analysis, design, and implementation of such systems in a formal, explicit manner. This model is based on an ontology, created using OWL (Web Ontology Language), providing a model of knowledge about in what way entities should be used and combined to control the execution of a set of orderly steps to develop these systems. Furthermore, this ontology has been validated through a set of academic’s projects, showing be great usefulness to developers. Show more

Keywords: Ontological model, computer supported cooperative work, collaborative system, web ontology language, PROMISE

DOI: 10.3233/JIFS-179913

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2545-2557, 2020

Price: EUR 27.50

Digital processing of ultrasound images on dilated blood vessels from diabetic patients

Article Type: Research Article

Abstract: Introduction Peripheral arterial disease (PAD) is a fairly common degenerative vascular condition in diabetic patients that leads to inadequate blood flow (BF), this disease is mainly due to atherosclerosis that causes chronic narrowing of arteries, which can precipitate acute thrombotic events. In patients with diabetes, atherosclerosis is the main reason for reducing life expectancy, as long as diabetic nephropathy and retinopathy are the largest contributors to end-stage renal disease and blindness, respectively. Objective This was an assessment of dilatation of the blood vessels on diabetic patients vs. healthy volunteers by using digital processing of imaging’s. Materials and Methods The study …subject was ultrasound imaging processing of blood vessels dilation on low extremities of diabetic patients, the results were compared with ultrasound images of healthy subjects. Results The digital images processing suggests that there is a significant difference among images experimental of the diabetic group and healthy volunteers’ images, the control group. Discussion The digital imaging processing performed in the Matlab platform is an adequate procedure for blood vessels dilation analysis of the ultrasound images taken from the lower extremities in diabetic patients. Show more

Keywords: Lower extremity, diabetic patients, images processing, magnetic field, patch

DOI: 10.3233/JIFS-179914

Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2559-2564, 2020

Price: EUR 27.50

Display: 10 | 50 | 100 items per page

Journal of Intelligent & Fuzzy Systems - Volume 39, issue 2

Self-attention for Twitter sentiment analysis in Spanish

Ranking based multi-label classification for sentiment analysis

Psychological attachment style prediction based on short biographies

Sentiment analysis in Nepali: Exploring machine learning and lexicon-based approaches

Using weighted directed graphs for identification of flow of emotions in poems

Ranking concrete and abstract words using Google Books Ngram data

Deep fusion of multiple term-similarity measures for biomedical passage retrieval

i HDT++: improving HDT for SPARQL triple pattern resolution

Measuring semantic similarity of documents with weighted cosine and fuzzy logic

Cross-dataset email classification

Natural ontologies with elastic matching for elicited knowledge comparison

A computational model for speech disorders using problematic phonemes with ontological reasoning

PIANI: An ontology-based platform for outdoor activity recognition and non-intrusive assistance to the elderly

Author detection: Analyzing tweets by using a Naïve Bayes classifier

#Brexit: Leave or remain? the role of user’s community and diachronic evolution on stance detection

A study of deep learning methods for same-genre and cross-genre author profiling

Predicting consumers engagement on Facebook based on what and how companies write

Author profiling on bi-lingual tweets

Authorship attribution of Spanish poems using n-grams and the web as corpus

Unsupervised extractive multi-document text summarization using a genetic algorithm

Extractive summarization using siamese hierarchical transformer encoders

Determining the importance of sentence position for automatic text summarization

A grammar for specifying full-body gestures elicited for abstract tasks

Towards building a Urdu Language Corpus using Common Crawl

“Bend the truth”: Benchmark dataset for fake news detection in Urdu language and its evaluation

Revisiting subject classification in academic databases: A comparison of the classification accuracy of Web of Science, Scopus & Dimensions

Measuring interdisciplinarity of research articles: An analysis of inter-relatedness of different parameters

Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Career path level estimation and skill qualification feedback from textual descriptions

An unsupervised lower-baseline localization method based on writing style features for historical documents

Solving arithmetic word problems: A deep learning based approach

Brain signal classification for creative tasks

PROMISE: PRoposing an Ontological Model for developing collaboratIve SystEms

Digital processing of ultrasound images on dilated blood vessels from diabetic patients

North America

Europe

Asia