Efficiently mining high utility sequential patterns in static and streaming data

Zihayat, Morteza; Wu, Cheng-Wei; An, Aijun; Tseng, Vincent S.; Lin, Chien

doi:10.3233/IDA-170874

Efficiently mining high utility sequential patterns in static and streaming data

Article type: Research Article

Authors: Zihayat, Morteza^{a; *} | Wu, Cheng-Wei^b | An, Aijun^a | Tseng, Vincent S.^b | Lin, Chien^c

Affiliations: [a] Department of Computer Science and Engineering, York University, Toronto, ON, Canada | [b] Department of Computer Science, National Chiao Tung University, Taiwan | [c] Smart Network System Institute, Institute for Information Industry, Taipei, Taiwan

Correspondence: [*] Corresponding author: Morteza Zihayat, Department of Computer Science and Engineering,York University, Toronto, ON, Canada. E-mail:zihayatm@cse.yorku.ca

Abstract: High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called Sequence-Suffix Utility is proposed for effectively pruning the search space in HUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show that HUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSP-Stream to update the data structures and discover HUSPs over data streams.

Keywords: High utility sequential pattern mining, data streams, sliding window

DOI: 10.3233/IDA-170874

Journal: Intelligent Data Analysis, vol. 21, no. S1, pp. S103-S135, 2017

Published: 1 April 2017

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia