Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Selected papers of KES2012 - Part 2 of 2
Guest editors: M. Graña, A.I. Gonzalez-Acuña and C. Zanni-Merk
Article type: Research Article
Authors: Stankov, Ivan; * | Todorov, Diman | Setchi, Rossitza
Affiliations: Knowledge Engineering Systems Group, School of Engineering, Cardiff University, Cardiff, UK | Computational Intelligence Group, University of the Basque Country, UPV/EHU, Spain
Correspondence: [*] Corresponding author: Ivan Stankov, Knowledge Engineering Systems Group, School of Engineering, Cardiff University, The Parade, CF24 3AA, UK. Tel.: +44 (0)29 2087 6060; E-mail: stankovid@cardiff.ac.uk
Abstract: The aim of document clustering is to produce coherent clusters of similar documents. Clustering algorithms rely on text normalisation techniques to represent and cluster documents. Although most document clustering algorithms perform well in specific knowledge domains, processing cross-domain document repositories is still a challenge. This paper attempts to address this challenge. It investigates the performance of the sk-means clustering algorithm across domains, by comparing the cluster coherence produced with semantic-based and traditional (TF-IDF-based) document representations. The evaluation is conducted on 20 different generic sub-domains of a thousand documents, each randomly selected from the Reuters21578 corpus. The experimental results obtained from the evaluation demonstrate improved coherence of clusters produced by using a semantically enhanced text stemmer (SETS), when compared to the text normalisation obtained with the Porter stemmer. In addition, semantic-based text normalisation is shown to be resistant to noise, which is often introduced in the index aggregation stage, a stage that acquires features to represent documents.
Keywords: Semantics, stemming, cluster coherency, partitional clustering
DOI: 10.3233/KES-130267
Journal: International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 17, no. 2, pp. 113-126, 2013
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl