Affiliations: [a] Department of Psychiatry, University of Iowa, IowaCity, IA, USA
| [b] Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| [c] CHDI Management/CHDI Foundation, Princeton, NJ, USA
| [d] APS Consulting Services Ltd., Washington, DC, USA
| [e] UCL Huntington’s Disease Centre, UCL Queen Square Institute of Neurology, UK Dementia Research Institute, Department of Neurodegenerative Diseases, University College London, London, UK
Correspondence:
[*]
Correspondence to: Jeffrey Long, University of Iowa, Iowa City, IA, USA. E-mail: jeffrey-long@uiowa.edu.
Abstract: Background:The Huntington’s Disease Integrated Staging System (HD-ISS) has four stages that characterize disease progression. Classification is based on CAG length as a marker of Huntington’s disease (Stage 0), striatum atrophy as a biomarker of pathogenesis (Stage 1), motor or cognitive deficits as HD signs and symptoms (Stage 2), and functional decline (Stage 3). One issue for implementation is the possibility that not all variables are measured in every study, and another issue is that the stages are broad and may benefit from progression subgrouping. Objective:Impute stages of the HD-ISS for observational studies in which missing data precludes direct stage classification, and then define progression subgroups within stages. Methods:A machine learning algorithm was used to impute stages. Agreement of the imputed stages with the observed stages was evaluated using graphical methods and propensity score matching. Subgroups were defined based on descriptive statistics and optimal cut-point analysis. Results:There was good overall agreement between the observed stages and the imputed stages, but the algorithm tended to over-assign Stage 0 and under-assign Stage 1 for individuals who were early in progression. Conclusion:There is evidence that the imputed stages can be treated similarly to the observed stages for large-scale analyses. When imaging data are not available, imputation can be avoided by collapsing the first two stages using the categories of Stage≤1, Stage 2, and Stage 3. Progression subgroups defined within a stage can help to identify groups of more homogeneous individuals.