You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Producing multiple tables for small areas with confidentiality protection

Abstract

One approach for protecting data confidentiality with tabular data is to apply statistical disclosure control (SDC) methods that randomly perturb selected variables in the underlying microdata set. A pre-tabular SDC approach has the attraction of retaining both consistency and additivity across tables. However, the effect of the perturbation needs to be reflected in the variance estimates for the estimates presented in the table cells. This paper describes the pre-tabular SDC method used for the 2006-2010 Census Transportation Planning Products and describes a method for estimating the variances of the cell estimates that account for the perturbation variance.

References

[1] 

Elliot M., Disclosure risk assessment, in: P. Doyle, J. Lane, J. Theeuwes and L. Zayatz, eds, Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies. Amsterdam, The Netherlands: Elsevier; (2001) , 75-95.

[2] 

Winkler W., Matching and record linkage. Washington, DC: U.S. Census Bureau, (1993) .

[3] 

Krenzke T., , Gentleman J., , Li J., and Moriarity C., Addressing disclosure concerns and analysis demands in a real-time online analytic system, Journal of Official Statistics 29: (1) ((2013) ), 99-134.

[4] 

Hundepool A., , Domingo-Ferrer J., , Franconi L., , Giessing S., , Nordholt E.S., , Spicer K., and de Wolf P.P., Statistical disclosure control. Chichester, UK: John Wiley & Sons, (2012) .

[5] 

Cox L.H., Suppression methodology and statistical disclosure control, Journal of the American Statistical Association 75: ((1980) ), 377-385.

[6] 

Fraser B., and Wooten J., A proposed method for confidentialising tabular output to protect against differencing. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality; Nov 9-11; Geneva, Switzerland: Australian Bureau of Statistics, (2005) .

[7] 

Dandekar R., Maximum utility-minimum information loss table server design for statistical disclosure control of tabular data. In: Domingo-Ferrer J, Torra Lecture V, editors. Privacy in statistical databases. New York: Springer; (2004) , 121-135.

[8] 

Machanavajjhala A., , Kifer D., , Abowd J., , Gehrke J., and Vilhuber L., Privacy: Theory meets practice on the map. ICDE 2008. Proceedings of the 2008 IEEE 24th International Conference on Data Engineering; 2008 April 7-12; Cancun, Mexico. Washington, DC: IEEE Computer Society; (2008) : 227-286.

[9] 

Rubin D.B., Discussion: Statistical disclosure limitation, Journal of Official Statistics 9: ((1993) ), 462-468.

[10] 

Liu F., and Little R.J.A., Multiple imputation and statistical disclosure control in microdata. Joint Statistical Meetings Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association; (2012) : 2133-2138.

[11] 

Raghunathan T.E., , Lepkowski J.M., , van Hoewyk J., and Solenberger P., A multivariate technique for multiply imputing missing values using a series of regression models, Survey Methodology 27: ((2001) ), 85-96.

[12] 

Fienberg S., and McIntyre J., Data swapping: Variations on a theme by Dalenius and Reiss, Journal of Official Statistics 21: (2) ((2005) ), 309-323.

[13] 

Moore R., Controlled data swapping techniques for masking public use datasets. U.S. Census Bureau Statistical Research Division Research Report RR96/04. Washington, DC: U.S. Census Bureau. (1996) . Available from https://www.census. gov/srd/papers/pdf/rr96-4.pdf.

[14] 

Muralidhar K., and Sarathy R., Data shuffling - A new masking approach for numerical data, Management Science 52: ((2006) ), 658-670.

[15] 

Domingo-Ferrer J., and Mateo-Sanz J., Practical data-oriented micro aggregation for statistical disclosure control, IEEE Transactions on Knowledge and Data Engineering 14: (1) ((2002) ), 189-201.

[16] 

Gouweleeuw J., , Kooiman P., , Willenborg L., and de Wolf P.P., Post randomisation for statistical disclosure control: Theory and implementation, Journal of Official Statistics 14: (4) ((1998) ), 463-478.

[17] 

de Wolf P.P., , Gouweleeuw J., , Kooiman P., and Willenborg L., Reflections on pram. Statistical Data Protection, Luxembourg: Office for Official Publications of the European Communities ((1998) ), 337-349.

[18] 

Matthews G., and Harel O., Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy, Statistics Surveys 5: ((2011) ), 1-29.

[19] 

Census Bureau U.S.. American FactFinder. (2015) . Available from http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml.

[20] 

Hawala S., , Zayatz L., and Rowland S., American FactFinder: Disclosure limitation for the advanced query system, Journal of Official Statistics 20: (1) ((2004) ), 115-124.

[21] 

Solanas A., and Martínez-Ballesté A., V-MDAV: Variable group size multivariate micro aggregation. COMPSTAT 2006. Rome, Italy, (2006) ; 917-925.

[22] 

Kaufman S., , Seastrom M., and Roey S., Do disclosure controls to protect confidentiality degrade the quality of the data? Proceedings of the Joint Statistical Meetings, Section on Government Statistics. Alexandria, VA: American Statistical Association, (2005) ; 1218-1225.

[23] 

Krenzke T., , Li J., , Freedman M., , Judkins D., , Hubble D., , Roisman R., and Larsen M., Producing transportation data products from the American Community Survey that comply with disclosure rules. Washington, DC: National Cooperative Highway Research Program, Transportation Research Board, National Academy of Sciences, (2011) .

[24] 

Reiter J., Using CART to generate partially synthetic public use microdata, Journal of Official Statistics 21: (3) ((2005) ), 441-462.

[25] 

Judkins D., , Piesse A., , Krenzke T., , Fan Z., and Haung W.C., Preservation of skip patterns and covariance structure through semi-parametric whole-questionnaire imputation, Joint Statistical Meetings Proceedings of the Section on Survey Research Methods of the American Statistical Association ((2007) ), 3211-3218.

[26] 

Andridge R.R., and Little R.J.A., A review of hot deck imputation for survey non-response, International Statistical Review 78: (1) ((2010) ), 40-64.

[27] 

Kalton G., and Flores-Cervantes I., Weighting methods, Journal of Official Statistics 19: (2) ((2003) ), 81-97.

[28] 

Woo M., , Reiter J., , Oganian A., and Karr A., Global measures of data utility for microdata masked for disclosure limitation, Journal of Privacy and Confidentiality 1: (1) ((2009) ), 111-124.

[29] 

Raghunathan T.E., , Reiter J.P., and Rubin D.B., Multiple imputation for statistical disclosure limitation, Journal of Official Statistics 19: ((2003) ), 1-16.

[30] 

Reiter J., Inference for partially synthetic, public use microdata sets, Survey Methodology 29: ((2003) ), 181-188.

[31] 

Wolter K.M., An investigation of some estimators of variance for systematic sampling, Journal of the American Statistical Association 79: ((1984) ), 781-790.

[32] 

Fay R., and Train G., Aspects of survey and model-based postcensal estimation of income and poverty characteristics for states and counties. Proceedings of the Section on Government Statistics. Alexandria, VA: American Statistical Association; (1995) : 154-159.

[33] 

Judkins D.R., Fay's method for variance estimation, Journal of Official Statistics 6: (3) ((1990) ), 223-239.

[34] 

Census Bureau U.S.. Variance estimation. American Community Survey design and methodology report. (2009) . Available from http://www.census.gov/acs/www/Downloads/survey_methodology/acs_design_methodology_ch12.pdf.

[35] 

Huang E.T., and Bell W.R., A simulation study of the distribution of Fay's successive difference replication variance estimator. Proceedings of the American Statistical Association, Survey Research Methods Section, [CD-ROM]. Alexandria, VA: American Statistical Association, (2009) .

[36] 

Satterthwaite F.E., An approximate distribution of estimates of variance components, Biometrics Bulletin 2: (6) ((1946) ), 110-114.