Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Sundarakumar, M.R.a; * | Mahadevan, G.b | Natchadalingam, R.c | Karthikeyan, G.d | Ashok, J.e | Manoharan, J. Samuelf | Sathya, V.g | Velmurugadass, P.h
Affiliations: [a] Research Scholar, AMC Engineering College, Bangalore, India | [b] AMC Engineering College, Bangalore, India | [c] School of Computing and Information Technology, Reva University, Bengaluru, India | [d] Department of EEE, Sona College of Technology, Salem, Tamil Nadu, India | [e] Department of ECE, V.S.B Engineering College, Karur, Tamil Nadu, India | [f] Department of ECE, Sir Isaac Newton College of Engineering and Technology, Nagapattinam, Tamil Nadu, India | [g] Department of Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India | [h] Department of Computer Science & Engineering, Kalasalingam Academy of Research and Education, Tamil Nadu, India
Correspondence: [*] Corresponding author. M.R. Sundarakumar, Research Scholar, AMC Engineering College, Bangalore, India. E-mail: sundar.infotechh@gmail.com.
Abstract: In the modern era, digital data processing with a huge volume of data from the repository is challenging due to various data formats and the extraction techniques available. The accuracy levels and speed of the data processing on larger networks using modern tools have limitations for getting quick results. The major problem of data extraction on the repository is finding the data location and the dynamic changes in the existing data. Even though many researchers created different tools with algorithms for processing those data from the warehouse, it has not given accurate results and gives low latency. This output is due to a larger network of batch processing. The performance of the database scalability has to be tuned with the powerful distributed framework and programming languages for the latest real-time applications to process the huge datasets over the network. Data processing has been done in big data analytics using the modern tools HADOOP and SPARK effectively. Moreover, a recent programming language such as Python will provide solutions with the concepts of map reduction and erasure coding. But it has some challenges and limitations on a huge dataset at network clusters. This review paper deals with Hadoop and Spark features also their challenges and limitations over different criteria such as file size, file formats, and scheduling techniques. In this paper, a detailed survey of the challenges and limitations that occurred during the processing phase in big data analytics was discussed and provided solutions to that by selecting the languages and techniques using modern tools. This paper gives solutions to the research people who are working in big data analytics, for improving the speed of data processing with a proper algorithm over digital data in huge repositories.
Keywords: HADOOP, SPARK, scalability, batch processing, big-data
DOI: 10.3233/JIFS-223295
Journal: Journal of Intelligent & Fuzzy Systems, vol. 44, no. 3, pp. 5231-5255, 2023
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl