Software understanding: Automatic classification of software identifiers

Warintarawej, P.; Huchard, M.; Lafourcade, M.; Laurent, A.; Pompidor, P.

doi:10.3233/IDA-150744

Software understanding: Automatic classification of software identifiers

Subtitle:

Article type: Research Article

Authors: Warintarawej, P. | Huchard, M. | Lafourcade, M. | Laurent, A.^* | Pompidor, P.

Affiliations: LIRMM, CNRS-Université de Montpellier, Montpellier, France

Correspondence: [*] Corresponding author: A. Laurent, LIRMM, UMR 5506 CNRS-Université de Montpellier 2 161, rue Ada, 34095 Montpellier Cedex 05, France. Tel.: +33 0 467 418 585; Fax: +33 0 467 418 500; E-mail:laurent@lirmm.fr

Abstract: Identifier names (e.g., packages, classes, methods, variables) are one of most important software comprehension sources. Identifier names need to be analyzed in order to support collaborative software engineering and to reuse source codes. Indeed, they convey domain concept of softwares. For instance, ``getMinimumSupport'' would be associated with association rule concept in data mining softwares, while some are difficult to recognize such as the case of mixing parts of words (e.g., ``initFeatSet''). We thus propose methods for assisting automatic software understanding by classifying identifier names into domain concept categories. An innovative solution based on data mining algorithms is proposed. Our approach aims to learn character patterns of identifier names. The main challenges are (1) to automatically split identifier names into relevant constituent subnames (2) to build a model associating such a set of subnames to predefined domain concepts. For this purpose, we propose a novel manner for splitting such identifiers into their constituent words and use N-grams based text classification to predict the related domain concept. In this article, we report the theoretical method and the algorithms we propose, together with the experiments run on real software source codes that show the interest of our approach.

Keywords: Automatic software understanding, data mining, text classification, software engineering

DOI: 10.3233/IDA-150744

Journal: Intelligent Data Analysis, vol. 19, no. 4, pp. 761-778, 2015

Published: 2015

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia