Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Khoshgoftaar, Taghi M.; * | Van Hulse, Jason
Affiliations: Florida Atlantic University, Boca Raton, Florida, FL, USA
Correspondence: [*] Corresponding author: Taghi M. Khoshgoftaar, Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431 USA. Tel.: +1 561 297 3994; Fax: +1 561 297 2800; E-mail: taghi@cse.fau.edu.
Abstract: A critical issue in data mining and knowledge discovery is the problem of data quality. Quantifying the presence of noise in dataset is often used as an indicator of data quality. While existing works have mostly focused on detecting class noise or mislabeling errors, very limited attention has been given to finding noisy attributes or features. Prior work in the area of noise handling has concentrated on the detection of observations that contain noise in either the attributes or class labels. Methodologies that provide insight into the quality of an attribute can provide valuable knowledge to a domain expert when data analysis is being performed. We present a novel methodology for detecting noisy attributes. The procedure utilizes our recently proposed Pairwise Attribute Noise Detection Algorithm (PANDA) for detecting instances with attribute noise. From a data analyst's point of view, our approach provides a viable solution to: “Given a dataset, which attribute(s) contains the most noise?”. The proposed methodology is investigated with multiple case studies of a real-world software measurement dataset. The empirical study is investigated by injecting simulated noise into one or more attributes of a dataset that has no class noise. Based on a domain expert's inspection of the obtained results, the effectiveness of our technique for detecting noisy attributes is demonstrated.
Keywords: Noise detection, data cleaning, software measurement, software quality
DOI: 10.3233/IDA-2005-9606
Journal: Intelligent Data Analysis, vol. 9, no. 6, pp. 589-602, 2005
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl