Affiliations: [a] Biocircuits Institute, University of California San Diego, San Diego, CA, USA. Tel: 1-858-534-1942, Fax: 1-858-534-1892; E-mail: rhuerta@ucsd.edu | [b] Computer Science Department, Universidad Autonoma de Madrid, Madrid, Spain; E-mail: fernando.corbacho@cognodata.com | [c] Computer Science Department, University of California San Diego, San Diego, CA, USA; E-mail: elkan@ucsd.edu
Note: [*] Also at Cognodata Consulting, S. L.
Abstract: This paper investigates the profitability of a trading strategy based on training a model to identify stocks with high or low predicted returns. A tail set is defined to be a group of stocks whose volatility-adjusted price change is in the highest or lowest quantile, for example the highest or lowest 5%. Each stock is represented by a set of technical and fundamental features computed using CRSP and Compustat data. A classifier is trained on historical tail sets and tested on future data. The classifier is chosen to be a nonlinear support vector machine (SVM) due to its simplicity and effectiveness. The SVM is trained once per month, in order to adjust to changing market conditions. Portfolios are formed by ranking stocks using the classifier output. The highest ranked stocks are used for long positions and the lowest ranked ones for short sales. The Global Industry Classification Standard is used to build a model for each sector such that a total of 8 long-short portfolios for Energy, Materials, Industrials, Consumer Discretionary, Consumer Staples, Health Care, Financials, and Information Technology are formed. The data range from 1981 to 2010. Without measuring trading costs, but using 91 day holding periods to minimize these, the strategy leads to annual excess returns (Jensen alpha) of 15% with volatilities under 8% using the top 25% of the stocks of the distribution for training long positions and the bottom 25% for the short ones.
Keywords: Support vector machines, sector neutral portfolios, long-short portfolios, technical analysis, fundamental analysis