Affiliations: [a] MIT World Peace University, Kothrud, Pune 411038, India | [b] GITAM School of Technology, GITAM Deemed to be University, GITAM University, Rudraram, Telangana 502329, India
Abstract: The process of retrieving essential information from the dataset is a significant data mining approach, which is specifically termed as data clustering. However, nature-inspired optimizations are designed in recent decades to solve optimization problems, particularly for data clustering complexities. However, the existing methods are not feasible to process with a large amount of data, as the execution time taken by the traditional approaches is larger. Hence, an efficient and optimal data clustering scheme is designed using the devised Fractional Sail Fish-Sparse Fuzzy C-Means + Particle Whale optimization (FSF-Sparse FCM + PWO) based MapReduce Framework (MRF) to process high dimensional data. Theproposed FSF-Sparse FCM is designed by the integration of Sail Fish Optimization (SFO) with fractional concept and Sparse FCM. The proposed MRF poses two functions, such as the mapper function and reducer function to perform the process of data clustering. Moreover, the proposed FSF-Sparse FCM is employed in the mapper phase to compute the cluster centroids, and thereby the intermediate data is generated. The intermediate data is tuned in the reducer phase using Particle Whale Optimization (PWO), which is the integration of Particle Swarm Optimization (PSO) and Whale optimization algorithm (WOA). Accordingly, the optimal cluster centroid is computed at the reducer phase using the objective function based on DB-Index. The proposed FSF-Sparse FM + PWO obtained the highest accuracy of 0.903 and lowest DB-Index of 39.07.
Keywords: Data clustering, big data, MapReduce framework, sparse FCM, sail fish optimization (SFO)