Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Saneifar, Hassana; * | Bonniol, Stéphaned | Poncelet, Pascalb | Roche, Mathieub; c
Affiliations: [a] Raja University, Computer Faculty, Tehran, Iran | [b] LIRMM – CNRS – University Montpellier 2, Montpellier, France | [c] UMR TETIS, Cirad, Irstea, Montpellier, France | [d] Satin Technologies, MIBI, 672 rue du Mas de Verchant, Montpellier, France
Correspondence: [*] Corresponding author: Hassan Saneifar, Raja University, Computer Faculty, Tehran, Iran. E-mail: Hassan.Saneifar@lirmm.fr.
Abstract: With the development of new technologies more and more information is stored in log files. Analyzing such logs can be very useful for the decision maker. One of the probably best known example is the Web log file analysis where lots of efficient tools have been proposed to extract the top-k accessed pages, the best users or even the patterns describing the behaviors of users on a Web site. These tools take advantages of the well-formed structures of the data. Unfortunately, logs files from the industrial world have very heterogeneous complex structures (e.g., tables, lists, data blocks). For experts, analyzing logs to find messages helping to better understand causes of a failure, if a problem have already occurred in the past or even knowing the main consequences of a failure is a hard, tedious, time-consuming and error-prone task. There is thus a need for new tools helping the experts to easily recognize the appropriate part in logs. Passage retrieval methods have proved to be very useful for extracting relevant parts in documents. In this paper we propose a new approach for automatically split logs files into relevant segments based on their logical units. We characterize the complex logical units found in logs according to their syntactic characteristics. We also introduce the notion of generalized vs-grams which is used to automatically extract the syntactic characteristics of special structures found in log files. Conducted experiments are performed on real datasets from the industrial world to demonstrate the efficiency of our proposal on the recognition of complex logical units.
Keywords: Log files, text segmentation, logical units, generalized vs-grams
DOI: 10.3233/IDA-150724
Journal: Intelligent Data Analysis, vol. 19, no. 2, pp. 431-448, 2015
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl