Mining skyline frequent-utility patterns from big data environment based on MapReduce framework

Wu, Jimmy Ming-Tai; Li, Ranran; Wu, Mu-En; Lin, Jerry Chun-Wei

doi:10.3233/IDA-220756

Mining skyline frequent-utility patterns from big data environment based on MapReduce framework

Article type: Research Article

Authors: Wu, Jimmy Ming-Tai^{a; *} | Li, Ranran^a | Wu, Mu-En^b | Lin, Jerry Chun-Wei^c

Affiliations: [a] Department of Information Management, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan | [b] Department of Information and Finance Management, National Taipei University of Technology, Taipei, Taiwan | [c] Department of Computer Science, Western Norway University of Applied Sciences, Bergen, Norway

Correspondence: [*] Corresponding author: Jimmy Ming-Tai Wu, Department of Information Management, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan. E-mail: wmt@wmt35.idv.tw.

Abstract: When the concentration focuses on data mining, frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are commonly addressed and researched. Many related algorithms are proposed to reveal the general relationship between utility, frequency, and items in transaction databases. Although these algorithms can mine FIMs or HUIMs quickly, these algorithms merely take into account frequency or utility as a unilateral criterion for itemsets but the other factors (e.g., distance, price) could be also valuable for decision-making. A new skyline framework has been presented to mine frequent high utility patterns (SFUPs) to better support user decision-making. Several new algorithms have been proposed one after another. However, the Internet of Things (IoT), mobile Internet, and traditional Internet are generating massive amounts of data every day, and these cutting-edge standalone algorithms can not satisfy the new challenge of finding interesting patterns from this data. Big Data uses a distributed architecture in the form of cloud computing to filter and process this data to extract useful information. This paper proposes a novel parallel algorithm on Hadoop as a three-stage iterative algorithm based on MapReduce. MapReduce is used to divide the mining tasks of the whole large data set into multiple independent sub-tasks to find frequent and high utility patterns in parallel. Numerous experiments were done in this paper, and from the results, the algorithm can handle large datasets and show good performance on Hadoop clusters.

Keywords: Data mining, skyline frequent-utility patterns (SFUPs), cloud computing, Hadoop, MapReduce

DOI: 10.3233/IDA-220756

Journal: Intelligent Data Analysis, vol. 27, no. 5, pp. 1359-1377, 2023

Published: 6 October 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia