Fundamenta Informaticae - Volume 78, issue 4 - Journals

Special Issue on Knowledge Discovery

Article Type: Other

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. i-iii, 2007

SPICE: A New Framework for Data Mining based on Probability Logic and Formal Concept Analysis

Authors: Jiang, Liying | Deogun, Jitender

Article Type: Research Article

Abstract: Formal concept analysis and probability logic are two useful tools for data analysis. Data is usually represented as a two-dimensional context of objects and features. FCA discovers dependencies within the data based on the relation among objects and features. On the other hand, the probability logic represents and reasons with both statistical and propositional probability among data. We propose SPICE – Symbolic integration of Probability Inference and Concept Extraction, which provides a more flexible and robust …framework for data mining tasks. Within SPICE, we formalize the important notions of data mining, such as concepts and patterns, and develop new notions such as maximal potentially useful patterns. In this paper, we formalize the association rule mining in SPICE and propose an enhanced rule mining approach, called SPICE association rule mining, to solve the problem of time inefficiency and rule redundancy in general association rule mining. We show an application of the SPICE approach in the Geo-spatial Decision Support System (GDSS). The experimental results show that SPICE can efficiently and effectively discover a succinct set of interesting association rules. Show more

Keywords: FCA, probability logic, data mining, association rules, redundant rules, important items

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 467-485, 2007

Price: EUR 27.50

Hierarchical Hidden Markov Models for User/Process Profile Learning

Authors: Galassi, Ugo | Botta, Marco | Giordana, Attilio

Article Type: Research Article

Abstract: This paper presents an algorithmfor automatically constructing sophisticated user/process profiles from traces of their behavior. A profile is encoded by means of a Hierarchical Hidden Markov Model (HHMM), which is a well formalized tool suitable to model complex patterns in long temporal or spatial sequences. A special sub-class of this hierarchical model, oriented to user/process profiling, is also introduced. The algorithm follows a bottom-up strategy, in which elementary facts in the sequences (motifs) are progressively …grouped, thus building the abstraction hierarchy of a HHMM, layer after layer. The method is firstly evaluated on artificial data. Then a user identification task, from real traces, is considered. A preliminary experimentation with several different users produced encouraging results. Show more

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 487-505, 2007

Price: EUR 27.50

Privacy Aware Data Management and Chase

Authors: Im, Seunghyun

Article Type: Research Article

Abstract: One of the key applications that uses the knowledge discovered by data mining is called Chase. Chase is a process that replaces null or missing values with the values predicted by the knowledge, and it is mainly used to obtain more complete information systems or to replace unknown attribute values in user queries. The process improves the quality of query answers with increased volume of reliable data, and helps the system understand user queries that would …otherwise be difficult. However, a security breach may occur when a set of data in an information system is confidential. The confidential data can be hidden from the public view. However, Chase has the capability to reveal the hidden data by classifying them as null or missing. In this paper, we discuss disclosure of confidential data by Chase and protection algorithms that reduce the risk. In particular, the proposed algorithms aim to protect confidential data with the least amount of additional data hiding. Show more

Keywords: DataMining, KnowledgeDiscovery in Database, Security, Data Confidentiality, Chase, Null Value Imputation, Knowledge Inference

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 507-524, 2007

Price: EUR 27.50

Towards Efficient Searching on the Secondary Structure of Protein Sequences

Authors: Seo, Minkoo | Park, Sanghyun | Won, Jung-Im

Article Type: Research Article

Abstract: Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of …protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively. Show more

Keywords: Indexing method, Secondary structure of proteins, and Approximate searching

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 525-542, 2007

Price: EUR 27.50

Unifying Framework for Rule Semantics: Application to Gene Expression Data

Authors: Agier, Marie | Petit, Jean-Marc | Suzuki, Einoshin

Article Type: Research Article

Abstract: The notion of rules is very popular and appears in different flavors, for example as association rules in data mining or as functional dependencies in databases. Their syntax is the same but their semantics widely differs. In the context of gene expression data mining, we introduce three typical examples of rule semantics and for each one, we point out that Armstrong's axioms are sound and complete. In this setting, we propose a unifying framework in which …any "well-formed" semantics for rules may be integrated. We do not focus on the underlying data mining problems posed by the discovery of rules, rather we prefer to discuss the expressiveness of our contribution in a particular application domain: the understanding of gene regulatory networks from gene expression data. The key idea is that biologists have the opportunity to choose – among some predefined semantics – or to define the meaning of their rules which best fits into their requirements. Our proposition has been implemented and integrated into an existing open-source system named MeV of the TIGR environment devoted to microarray data interpretation. An application has been performed on expression profiles of a sub-sample of genes from breast cancer tumors. Show more

Keywords: Rules, implications, Armstrong's axiom system, data mining

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 543-559, 2007

Price: EUR 27.50

Visualization of Differences between Rules' Syntactic and Semantic Similarities using Multidimensional Scaling

Authors: Tsumoto, Shusaku | Hirano, Shoji

Article Type: Research Article

Abstract: One of the most important problems with rule inductionmethods is that it is very difficult for domain experts to check millions of rules generated from large datasets, although the discovery from these rules requires deep interpretation from domain knowledge. Although several solutions have been proposed in the studies on data mining and knowledge discovery, these studies are not focused on similarities between rules obtained. When one rule r_1 has reasonable features and the other …rule r_2 with high similarity to r_1 includes unexpected factors, the relations between these rules will become a trigger to the discovery of knowledge. In this paper, we propose a visualization approach to show the similarity relations between rules based on multidimensional scaling, which assign a two-dimensional cartesian coordinate to each data point from the information about similarities between this data and others data. We evaluated this method on two medical data sets, whose experimental results show that knowledge useful for domain experts can be found. Show more

Keywords: Rule Induction, Rough Sets, Multidimensional Scaling, Visualization

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 561-573, 2007

Price: EUR 27.50

A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Authors: Salleb-Aouissi, Ansaf | Vrain, Christel

Article Type: Research Article

Abstract: In this paper, we mainly address the problem of loading transaction datasets into main memory and estimating the density of such datasets. We propose BOOLLOADER, an algorithm dedicated to these tasks; it relies on a compressed representation of all the transactions of the dataset. For sake of efficiency, we have chosen Decision Diagrams as the main data structure to the representation of datasets into memory. We give an experimental evaluation of our algorithm on both dense …and sparse datasets. Experiments have shown that BOOLLOADER is efficient for loading some dense datasets and gives a partial answer about the nature of the dataset before time-consuming pattern extraction tasks. We further investigate the use of Algebraic Decision Diagrams by studying the feasibility of current Data Mining operations, as for instance computing the support of an itemset and even mining frequent itemsets. Show more

Keywords: Transaction dataset, boolean function, decision diagram, density, frequent itemset

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 575-594, 2007

Price: EUR 27.50

Privacy Preserving Database Generation for Database Application Testing

Authors: Wu, Xintao | Wang, Yongge | Guo, Songtao | Zheng, Yuliang

Article Type: Research Article

Abstract: Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we …investigate the problem of generating synthetic databases based on a-priori knowledge about production databases. Our approach is to fit the general location model using various characteristics (e.g., constraints, statistics, rules) extracted from a production database and then generate synthetic data using model learned. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attackers to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure and perturbation for value disclosure analysis. Show more

Keywords: Data Generation, Disclosure Analysis, Statistical Database Modeling

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 595-612, 2007

Price: EUR 27.50

Sound Isolation by Harmonic Peak Partition For Music Instrument Recognition

Authors: Zhang, Xin | Raś, Zbigniew W.

Article Type: Research Article

Abstract: Identification of music instruments in polyphonic sounds is difficult and challenging, especially where heterogeneous harmonic partials are overlapping with each other. This has stimulated the research on sound separation for content-based automatic music information retrieval. Numerous successful approaches on musical data feature extraction and selection have been proposed for instrument recognition in monophonic sounds. Unfortunately, none of those algorithms can be successfully applied to polyphonic sounds. Based on recent research in sound …classification of monophonic sounds and studies in speech recognition, Moving Picture Experts Group (MPEG) standardized a set of features of the digital audio content data for the purpose of interpretation of the informationmeaning. Most of themare in a formof largematrix or vector of large size, which are not suitable for traditional data mining algorithms; while other features in smaller size are not sufficient for instrument recognition in polyphonic sounds. Therefore, these acoustical features themselves alone cannot be successfully applied to classification of polyphonic sounds. However, these features contain critical information, which implies music instruments' signatures. We have proposed a novel music information retrieval system with MPEG-7-based descriptors and we built classifiers which can retrieve the important time-frequency timbre information and isolate sound sources in polyphonic musical objects, where two instruments are playing at the same time, by energy clustering between heterogeneous harmonic peaks. Show more

Keywords: Music InstrumentsDetection, MPEG-7 descriptors, Musical Sound Separation, Energy Clustering, Sound Classification, and Feature Extraction

Citation: Fundamenta Informaticae, vol. 78, no. 4, pp. 613-628, 2007

Price: EUR 27.50

Fundamenta Informaticae - Volume 78, issue 4

Special Issue on Knowledge Discovery

SPICE: A New Framework for Data Mining based on Probability Logic and Formal Concept Analysis

Hierarchical Hidden Markov Models for User/Process Profile Learning

Privacy Aware Data Management and Chase

Towards Efficient Searching on the Secondary Structure of Protein Sequences

Unifying Framework for Rule Semantics: Application to Gene Expression Data

Visualization of Differences between Rules' Syntactic and Semantic Similarities using Multidimensional Scaling

A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Privacy Preserving Database Generation for Database Application Testing

Sound Isolation by Harmonic Peak Partition For Music Instrument Recognition

North America

Europe

Asia