Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Sun, Haojun1; * | Chen, Rongbo1 | Qin, Yong2 | Wang, Shengrui3
Affiliations: [1] Department of Computer Science, Shantou University, Shantou, China | [2] State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China | [3] Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada, E-mails: haojunsun@stu.edu.cn, 13rbchen@stu.edu.cn, yqin@bjtu.edu.cn, shengrui.wang@usherbrooke.ca
Correspondence: [*] Corresponding author.
Abstract: Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerical data. Hierarchical clustering algorithms are often based on similarity measures computed on a common feature space, which is not effective when clustering high-dimensional data. Subspace clustering algorithms discover feature subspaces for clusters, but are mostly partition-based; i.e. they do not produce a hierarchical structure of clusters. In this paper, we propose a hierarchical algorithm for clustering high-dimensional categorical data, based on a recently proposed information-theoretical concept named holo-entropy. The algorithm proposes new ways of exploring entropy, holo-entropy and attribute weighting in order to determine the feature subspace of a cluster and to merge clusters even though their feature subspaces differ. The algorithm is tested on UCI datasets, and compared with several state-of-the-art algorithms. Experimental results show that the proposed algorithm yields higher efficiency and accuracy than the competing algorithms and allows higher reproducibility.
Keywords: hierarchical clustering, holo-entropy, subspace, categorical data
Journal: Informatica, vol. 28, no. 2, pp. 303-328, 2017
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl