Affiliations: [a] Martin Tuchman School of Management, Newark, New Jersey, USA
| [b] NJIT Department of Computer Science, University Heights, Newark, NJ, USA
| [c] Department of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovska, Prague, Czech Republic
Correspondence:
[*]
Corresponding author: Stephen Taylor, Martin Tuchman School of Management, 3000 Central Avenue Building (CAB), Newark, New Jersey 07102, USA and Department of Probability and Mathematical Statistics, Faculty of Mathematics
and Physics, Charles University, Sokolovska 83, 186 75 Prague, Czech Republic. E-mail: steve98654@gmail.com.
Abstract: We examine to what extent the GICS sector categorization of equity securities may be systematically reconstructed from historical quarterly firm fundamental data using gradient boosted tree classification. Model complexity and performance tradeoffs are examined and relative feature importance is described. Potential extensions are outlined including ideas to improve feature engineering, validating internal consistency and integrating additional data sources to further improve classification accuracy.
Keywords: GICS sector, gradient boosted trees, fundamental data, financial ratios