Investigating cluster validation metrics for optimal number of clusters determination

Karanikola, Aikaterini; Liapis, Charalampos M.; Kotsiantis, Sotiris

doi:10.3233/IDT-210187

Investigating cluster validation metrics for optimal number of clusters determination

Issue title: Special Collection of Extended Selected Papers on Novel Research Results Presented in the IISA2021

Guest editors: George A. Tsihrintzis, Maria Virvou and Ioannis Hatzilygeroudis

Article type: Research Article

Authors: Karanikola, Aikaterini^* | Liapis, Charalampos M. | Kotsiantis, Sotiris

Affiliations: Department of Mathematics, University of Patras, Rion, Patras, Greece

Correspondence: [*] Corresponding author: Aikaterini Karanikola, Department of Mathematics, University of Patras, Rion, Patras, Greece. E-mail: karanikola@upatras.gr.

Abstract: In short, clustering is the process of partitioning a given set of objects into groups containing highly related instances. This relation is determined by a specific distance metric with which the intra-cluster similarity is estimated. Finding an optimal number of such partitions is usually the key step in the entire process, yet a rather difficult one. Selecting an unsuitable number of clusters might lead to incorrect conclusions and, consequently, to wrong decisions: the term “optimal” is quite ambiguous. Furthermore, various inherent characteristics of the datasets, such as clusters that overlap or clusters containing subclusters, will most often increase the level of difficulty of the task. Thus, the methods used to detect similarities and the parameter selection of the partition algorithm have a major impact on the quality of the groups and the identification of their optimal number. Given that each dataset constitutes a rather distinct case, validity indices are indicators introduced to address the problem of selecting such an optimal number of clusters. In this work, an extensive set of well-known validity indices, based on the approach of the so-called relative criteria, are examined comparatively. A total of 26 cluster validation measures were investigated in two distinct case studies: one in real-world and one in artificially generated data. To ensure a certain degree of difficulty, both real-world and generated data were selected to exhibit variations and inhomogeneity. Each of the indices is being deployed under the schemes of 9 different clustering methods, which incorporate 5 different distance metrics. All results are presented in various explanatory forms.

Keywords: Clustering, cluster validation, validity index, relative criteria, number of clusters

DOI: 10.3233/IDT-210187

Journal: Intelligent Decision Technologies, vol. 15, no. 4, pp. 809-824, 2021

Published: 10 January 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia