Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Environmental Data Mining
Guest editors: Karina Gibert
Article type: Research Article
Authors: Everaert, Gerta; * | Pauwels, Ineb | Bennetsen, Elinaa | Goethals, Peter L.M.a
Affiliations: [a] Ghent University, Laboratory of Environmental Toxicology and Aquatic Ecology, Coupure Links 653, B-9000 Ghent, Belgium | [b] Research Institute for Nature and Forest (INBO), Kliniekstraat 25, B-1070, Brussels, Belgium
Correspondence: [*] Corresponding author. Tel.: +32 92 64 38 97; E-mail: gert.everaert@ugent.be.
Abstract: In the present research, we found that different preprocessing options and parameterizations of classification and regression trees alter their model fit and have a direct effect on their applicability for end-users. We found that, in terms of applicability, classification trees react different to pruning than regression trees. Indeed, in case of high pruning levels, classification focus on the extreme values of the response variable, whereas regression tree are more likely to predict the intermediate values. Furthermore, when applying cross-validation with a high number of folds, modellers are likely to find one model that outperforms the other models in terms of reliability. Models were assessed based on the determination coefficient, the percentage of Correctly Classified Instances and the Cohen’s Kappa statistic for each parameterization. We found positive correlations (R2>0.70) between the statistical criteria and we found a non-linear negative relation between the model fit and the level of pruning. Therefore, environmental modellers should make use of an exhaustive list of model parameterizations to develop and compare environmental models in a transparent and objective manner. General methodological guidelines derived from the present research may help modellers to efficiently select statistical and ecological relevant models that are meeting the needs of users. The validity of our conclusion should be further tested for other datasets and scientific domains as our findings are based on one set of freshwater data.
Keywords: Classification and regression tree, parameterization, applicability, field data
DOI: 10.3233/AIC-160711
Journal: AI Communications, vol. 29, no. 6, pp. 711-723, 2016
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl