Parallel and fault-tolerant k-means clustering based on the actor model

Taamneh, Salah; Qawasmeh, Ahmad; Aljammal, Ashraf H.

doi:10.3233/MGS-200336

Parallel and fault-tolerant k-means clustering based on the actor model

Article type: Research Article

Authors: Taamneh, Salah^* | Qawasmeh, Ahmad | Aljammal, Ashraf H.

Affiliations: Department of Computer Science and Applications, The Hashemite University, Zarqa, Jordan

Correspondence: [*] Corresponding author: Salah Taamneh, Department of Computer Science and Applications, The Hashemite University, Zarqa, Jordan. Tel.: +962 790107961; Fax: +962 53826625; E-mail: taamneh@hu.edu.jo.

Abstract: K-means algorithm is a well-known unsupervised machine learning tool that aims at splitting a given dataset into a fixed number of clusters via iterative refinement approach. Running such an algorithm on today’s datasets that are characterized by its high multidimensionality and huge size requires using fault-tolerance mechanisms to mitigate the impact of possible failures. In this paper, we propose an actor-based implementation of k-means algorithm. The algorithm was made fault-tolerant by periodically saving the centroids into a stable storage during the failure-free execution, and restarting from the last saved centroids upon a failure. This was implemented in two different ways: optimistic checkpointing (blocking) and pessimistic checkpointing (non-blocking). The actor-based k-means algorithm was evaluated on a machine with eight cores. The experiments showed that the proposed algorithm scales very well as the number of workers increases, and can be up to ∼ 2x faster than a Java-thread-based implementation of k-means algorithm. The results also showed that the optimistic algorithm outperformed the pessimistic one, specifically, in the presence of competing I/O operations. Several failures were forced to occur during the execution to evaluate the performance of the fault-tolerant implementations. The experiments showed that the average amount of lost work ranged from 3–6%.

Keywords: Parallel k-means, actor-model, checkpointing

DOI: 10.3233/MGS-200336

Journal: Multiagent and Grid Systems, vol. 16, no. 4, pp. 379-396, 2020

Received 13 February 2020

Accepted 8 September 2020

Published: 31 December 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia