Variance estimation by multivariate imputation methods in complex survey designs

Kim, Jong-Min; Lee, Kee-Jae; Kim, Wonkuk

doi:10.3233/MAS-170394

Variance estimation by multivariate imputation methods in complex survey designs

Issue title: Machine Learning in Applied Statistics

Guest editors: Jong-Min Kim

Article type: Research Article

Authors: Kim, Jong-Min^a | Lee, Kee-Jae^b | Kim, Wonkuk^{c; *}

Affiliations: [a] Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN, USA | [b] Department of Information Statistics, Korea National Open University, Seoul, Korea | [c] Department of Applied Statistics, Chung-Ang University, Seoul, Korea

Correspondence: [*] Corresponding author: Wonkuk Kim, Department of Applied Statistics, Chung-Ang University, 84 Heukseok-Ro, Dongjak-Gu, Seoul 06974, Korea. Tel.: +82 02 820 6688; Fax: +82 814 5498; E-mail: wkim@cau.ac.kr.

Abstract: In this paper, we consider variance estimation of the sample mean when the missing data have been imputed with multivariate imputation methods. Modern multivariate imputation methods to missing data are complicated and computationally expensive. These multivariate imputation methods do not require the normality assumption to impute the missing values. Under this assumption free condition, we compare the performance of variance estimation of six modern multivariate imputation methods including copula imputation, random forest imputation, principal component analysis imputation, and k-nearest neighbors imputation methods in complex sampling designs such as stratified sampling, cluster sampling and resampling approach to variance estimation by jackknife and bootstrap methods in stratified sampling. We conducted simulation studies using National Health and Nutrition Survey data considering 5% and 15% missing completely at random (MCAR) rates. Based on our 500 times resampling simulation study of the mean squares errors of the sample mean in complex survey designs, the percent relative efficiency (RE(%)) of the random forest (RF) imputation method appears to outperform other imputation methods overall when the data has high skewness at the 5% missing rate and when the data has high excessive kurtosis at the 15% missing rate whereas the principal component analysis (PCA) imputation method appears to outperform other imputation methods when the data has high skewness at the 5% and 15% missing rates. Especially, the RE(%) of the multivariate imputation methods appears to be efficient in the cluster sampling design when the data has high skewness or excessive kurtosis at the 15% missing rate.

Keywords: Missing at random (MAR), copula imputation, jackknife, bootstrap

DOI: 10.3233/MAS-170394

Journal: Model Assisted Statistics and Applications, vol. 12, no. 3, pp. 195-207, 2017

Published: 30 August 2017

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia