You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Iterative method for reducing the impact of outlying data points: Ensuring data quality

Abstract

Data editing is essential to check the survey data for possible data problems. Outlying data values are frequently encountered in sample surveys. Consequently, in working with data, the correctness of the reported values must be verified, and if a reported value constitutes an outlier, its appropriate treatment needs to be considered. In this paper, the Iterative method for the reducing the impact of outlying data points is proposed. The novelty of the Iterative method for the reducing the impact of outliers is the following: an iterative approach for determining the outlying data points is proposed; outliers are determined considering the impact of conjoined factors; estimation of weight coefficients of the outliers and estimation of the total measurement error of the non-linear regression model is carried out.

References

[1] 

Anscombe, Francis John. Rejection of outliers, Technometrics 2: ((1960) ), 123-147.

[2] 

Biemer P.P., , Groves R.M., , Lyberg L.E., , Mathiowetz N.A., and Sudman S., Measurement Errors in Surveys. Wiley, (1991) .

[3] 

Chambers R.L., Outlier robust finite population estimation, Journal of the American Statistical Association 81: ((1986) ), 1063-1069.

[4] 

Cohen W.M., and Klepper S., A Reprise on Firm Size and R&D, Economic Journal 106: (437) ((1996) ), 925-951.

[5] 

Cox B.G. et al., Business Survey Methods. John Wiley & Sons, (1995) .

[6] 

Crumbling D.N., In search of representativeness: evolving the environmental data quality model, Quality Assurance 9: ((2001) ), 179-190.

[7] 

de Leeuw E.D., and van der Zouwen J., Data quality in telephone and face to face surveys: A comparative meta-analysis, Telephone Survey Methodology ((1988) ), 283-300.

[8] 

Dean R.B., and Dixon W.J., Simplified Statistics for Small Numbers of Observations, Anal. Chem 23: (4) ((1951) ), 636-638.

[9] 

Duncan K.B., and Stasny E.A., Using propensity scores to control coverage bias in telephone surveys, Survey Methodology 27: (2) ((2001) ), 121-130.

[10] 

European Commission, (2014) , SBA Fact Sheet-Latvia.

[11] 

Eurostat. Statistics Explained Glossary. http://ec.europa.eu/ eurostat/statistics-explained/index.php/Glossary:Outlier [online, accessed 8 November 2015]

[12] 

Filzmoser P., Identification of multivariate outliers: A performance study, Australian Journal of Statistics 34: ((2005) ), 127-138.

[13] 

Filzmoser (a) P., Univariate and Multivariate Outlier Detection with Application to Geochemical Data http://www. statistik.tuwien.ac.at/rmed03/abstracts/filzmoser.pdf [online, accessed 7 November 2015].

[14] 

Groves R.M., , Magilavy L.J., and Mathiowetz N.A., The process of interviewer variability: Evidence from telephone surveys. In ASA Proceedings of the Section on Survey Research Methods, Alexandria, VA. American Statistical Association, (1981) , pp. 438-443.

[15] 

Grubbs F.E., Procedures for Detecting Outlying Observations in Samples, Technometrics 11: ((1969) ), 1-21.

[16] 

Harrison B., Lean and Mean. Basic Books, New York, (1994) .

[17] 

Hidiroglou M.A., , Drew J.D., and Gray G.B., A framework for measuring and reducing nonresponse in surveys, Survey Methodology 19: ((1993) ), 81-94.

[18] 

Horváth, Gergely, Presentation and development of outlier treatment in HCSO. United Nations Economic Commission for Europe Conference of European Statisticians. Work Session on Statistical Data Editing, (2014) , pp. 1-10.

[19] 

Iglewicz B., and Hoaglin D.C., How to detect and handle outliers. Milwaukee, WI.: ASQC Quality Press, (1993) .

[20] 

Knorr E.M., and Raymond T.N., Algorithms for mining distance-based outliers in large datasets, in VLDB '98: Proceedings of the 24rd International Conference on Very Large Data Bases, (San Francisco, CA, USA), Morgan Kaufmann Publishers Inc, (1998) , pp. 392-403.

[21] 

Kriegel H.-P., , Hubert M.S., and Zimek A., Angle-based outlier detection in high-dimensional data, in KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, (New York, NY, USA), (2008) , pp. 444-452, ACM.

[22] 

Miller H., The multiple dimensions of information quality, Information Systems Management 13: (2) ((1996) ), 79-83.

[23] 

OECD, OECD glossary of statistical terms. Available on-line at http://stats.oecd.org/glossary/detail.asp?ID=5054.

[24] 

Osborne J.W., and Overbay A., The power of outliers (and why researchers should always check for them), Practical Assessment, Research & Evaluation 9: (6) ((2004) ), 1-12.

[25] 

Peña D., and Prieto F.J., Multivariate outlier detection and robust covariance matrix estimation (with discussion), Technometrics 43: ((2001) ), 286-310.

[26] 

Penny K.I., and Jolliffe I.T., A comparison of multivariate outlier detection methods for clinical laboratory safety data, The Statistician 50: (3) ((2001) ), 295-308.

[27] 

Rousseeuw P.J., and Croux C., Alternatives to the median absolute deviation, Journal of the American Statistical Association 88: (424) ((1993) ), 1273-1283.

[28] 

Rousseeuw P.J., and van Zomeren B.C., Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association 85: ((1990) ), 633-639.

[29] 

Skribane I., and Jekabsone S., Structural weaknesses and challenges of the economic growth of Latvia, Social Research 1: (34) ((2014) ), 74-85.

[30] 

Papadimitriou Spiros, , Kitagawa Hiroyuki, , Gibbons Philip B., , and Faloutsos Christos, LOCI: Fast outlier detection using the local correlation integral, in Proceedings of the 19th International Conference on Data Engineering: (2003) , pp. 315-326, IEEE Computer Society Press.

[31] 

Statistical Solutions. Univariate and Multivariate Outliers. http://www.statisticssolutions.com/univariate-and-multivariate-outliers/ [online, accessed 8 November 2015].

[32] 

Řķiltere D., and Danusēvičs M., Interval Forecasting Methods In Longterm Statistical Forecasting, A Journal of the International Institute for General Systems Studies 11: (1) ((2010) ), 11-20.

[33] 

Tether B., Small and large firms: Sources of unequal innovations? CRIC Discussion Paper No 11: , (1998) , pp. 1-40.

[34] 

Todorov V. et al., Detection of Multivariate Outliers in Business Survey Data with Incomplete Information http://www.statistik.tuwien.ac.at/public/filz/papers/ADAC10.pdf [online, accessed 9 November 2015].

[35] 

Tucker C., The estimation of instrument effects on data quality in the Consumer Expenditure Diary Survey, Journal of Official Statistics 8: ((1992) ), pp. 41-61.