ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model

Cheng, Haodong; Han, Meng; Zhang, Ni; Wang, Le; Li, Xiaojuan

doi:10.3233/JIFS-210610

ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model

Article type: Research Article

Authors: Cheng, Haodong | Han, Meng^{; *} | Zhang, Ni | Wang, Le | Li, Xiaojuan

Affiliations: School of Computer Science and Engineering, North Minzu University, Yinchuan, China

Correspondence: [*] Corresponding author. Meng Han, School of Computer Science and Engineering, North Minzu University, Ning Xia, China. E-mails: 2003051@nmu.edu.cn, 734811467@qq.com.

Abstract: The researcher proposed the concept of Top-K high-utility itemsets mining over data streams. Users directly specify the number K of high-utility itemsets they wish to obtain for mining with no need to set a minimum utility threshold. There exist some problems in current Top-K high-utility itemsets mining algorithms over data streams including the complex construction process of the storage structure, the inefficiency of threshold raising strategies and utility pruning strategies, and large scale of the search space, etc., which still can not meet the requirement of real-time processing over data streams with limited time and memory constraints. To solve this problem, this paper proposes an efficient algorithm based on dataset projection for mining Top-K high-utility itemsets from a data stream. A data structure CIUDataListSW is also proposed, which stores the position of the item in the transaction to effectively obtain the initial projected dataset of the item. In order to improve the projection efficiency, this paper innovates a new reorganization technology for projected transactions in common batches to maintain the sort order of transactions in the process of dataset projection. Dual pruning strategy and transaction merging mechanism are also used to further reduce search space and dataset scanning costs. In addition, based on the proposed CUDHSW structure, an efficient threshold raising strategy CUD is used, and a new threshold raising strategy CUDCB is designed to further shorten the mining time. Experimental results show that the algorithm has great advantages in running time and memory consumption, and it is especially suitable for the mining of high-utility itemsets of dense datasets.

Keywords: Itemset mining, utility mining, high utility itemsets, data streams, Top-K high-utility

DOI: 10.3233/JIFS-210610

Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 2, pp. 3317-3338, 2021

Published: 15 September 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia