A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Appice, Annalisa; Ceci, Michelangelo; Turi, Antonio; Malerba, Donato

doi:10.3233/IDA-2010-0456

A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Issue title: Ubiquitous Knowledge Discovery

Guest editors: João Gamax and Michael Mayy

Article type: Research Article

Authors: Appice, Annalisa^{; *} | Ceci, Michelangelo | Turi, Antonio | Malerba, Donato

Affiliations: Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy | [x] LIAAD, University of Porto, Porto, Portugal | [y] Fraunhofer IAIS, Sankt Augustin, Germany

Correspondence: [*] Corresponding author: Annalisa Appice, Dipartimento di Informatica, Università degli Studi di Bari, via Orabona, 4 – 70126 Bari, Italy. E-mail: appice@di.uniba.it.

Abstract: The amount of data produced by ubiquitous computing applications is quickly growing, due to the pervasive presence of small devices endowed with sensing, computing and communication capabilities. Heterogeneity and strong interdependence, which characterize ‘ubiquitous data’, require a (multi-)relational approach to their analysis. However, relational data mining algorithms do not scale well and very large data sets are hardly processable. In this paper we propose an extension of a relational algorithm for multi-level frequent pattern discovery, which resorts to data sampling and distributed computation in Grid environments, in order to overcome the computational limits of the original serial algorithm. The set of patterns discovered by the new algorithm approximates the set of exact solutions found by the serial algorithm. The quality of approximation depends on three parameters: the proportion of data in each sample, the minimum support thresholds and the number of samples in which a pattern has to be frequent in order to be considered globally frequent. Considering that the first two parameters are hardly controllable, we focus our investigation on the third one. Theoretically derived conclusions are also experimentally confirmed. Moreover, an additional application in the context of event log mining proves the viability of the proposed approach to relational frequent pattern mining from very large data sets.

DOI: 10.3233/IDA-2010-0456

Journal: Intelligent Data Analysis, vol. 15, no. 1, pp. 69-88, 2011

Published: 19 January 2011

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia