Synthetic establishment microdata around the world
Abstract
In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business microdata is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic \emph{establishment} microdata. This overview situates those papers, published in this issue, within the broader literature.
References
[1] | Little R.J.A., Statistical Analysis of Masked Data, Journal of Official Statistics 9: (2) ((1993) ), 407-426. |
[2] | Rubin D.B., Discussion of Statistical Disclosure Limitation, Journal of Official Statistics 9: (2) ((1993) ), 461-468. |
[3] | Drechsler J., Synthetic Datasets for Statistical Disclosure Control-Theory and Implementation, New York: Springer, (2011) . |
[4] | Bender S., The {RDC} of the {Federal Employment Agency} as a part of the {German} {RDC} Movement, In: Comparative Analysis of Enterprise Data, 2009 Conference, (2009) . Available from: {http://gcoe.ier.hit-u.ac.jp/CAED/index.html}. |
[5] | Vilhuber L., Methods for Protecting the Confidentiality of Firm-Level Data: {I}ssues and Solutions, Labor Dynamics Institute, (2013) , 19: . Available from: {http://digitalcommons.ilr.cor nell.edu/ldi/19/}. |
[6] | Abowd J.M., and Lane J.I., New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers, in: Privacy in Statistical Databases, (2004) , pp. 282-289. Available from: {http://www.springer.com/la/\linebreak book/9783540221180}. |
[7] | Abowd J.M., and Schmutte I., Economic analysis and statistical disclosure limitation. Brookings Papers on Economic Activity. Fall (2015) . Available from: {http://www.brookings.edu/ about/projects/bpea/papers/2015/economic-analysis-statistic al-disclosure-limitation}. |
[8] | Miranda J., and Vilhuber L., Looking Back On Three Years Of Using The Synthetic LBD Beta. Statistical Journal of the IAOS: Journal of the International Association for Official Statistics. (2014) , 30: . Available from: http://iospress.metapress.com/content/X415V18331Q33150. |
[9] | Drechsler J., and Vilhuber L., A First Step Towards A {German} {SynLBD}: {C}onstructing A {G}erman {L}ongitudinal {B}usiness {D}atabase, Statistical Journal of the IAOS: Journal of the International Association for Official Statistics 30: (2) ((2014) ), Available from: {http://iospress.metapress.com/content/X415 V18331Q33150}. |
[10] | Jarmin R.S., , Louis T.A., and Miranda J., Expanding The Role Of Synthetic Data At The U.S. Census Bureau, Statistical Journal of the IAOS: Journal of the International Association for Official Statistics 30: (2) ((2014) ), Available from: {http://ios press.metapress.com/content/fl8434n4v38m4347/?p=00c99b 98bf2f4701ae806ee638594915&pi=0}. |
[11] | Kinney S.K., , Reiter J.P., and Miranda J., Improving The Synthetic Longitudinal Business Database, Statistical Journal of the IAOS: Journal of the International Association for Official Statistics 30: (2) ((2014) ). |
[12] | Abowd J.M., Synthetic establishment data: Origins and introduction to current research, Statistical Journal of the IAOS: Journal of the International Association for Official Statistics 30: (2) ((2014) ). Available from: {http://iospress.metapress. com/content/76707M55W510VT16}. |
[13] | Jarmin R., and Miranda J., The {L}ongitudinal {B}usiness {D}atabase. U.S. Census Bureau, Center for Economic Studies; (2002) . CES-WP-02-17. |
[14] | {U S Census Bureau}, Longitudinal Business Database ({LBD}). Washington, DC USA: {U.S. Census Bureau} [distributor]; (2012) . Available from: {https://www.census.gov/ces/dataprodu cts/ datasets/lbd.html}. |
[15] | Kinney S.K., , Reiter J.P., , Reznek A.P., , Miranda J., , Jarmin R.S., and Abowd J.M., Towards Unrestricted Public Use Business Microdata: The {S}ynthetic {L}ongitudinal {B}usiness {D}atabase, International Statistical Review 79: (3) ((2011) ), 362-384. Available from: {http://ideas.repec.org/a/bla/istatr/v79y2011 i3p362-384.html}. |
[16] | Abowd J.M., and Vilhuber L., Synthetic Data Server; (2010) . Available from: {http://www.vrdc.cornell.edu/sds/}. |
[17] | Drechsler J., Synthetische {S}cientific-Use-Files der {W}elle 2007 des {IAB}-{B}etriebspanels, Institute for Employment Research, Nuremberg, Germany; (2011) . 201101_de. Available from: {http://ideas.repec.org/p/iab/iabfme/201101_de.html}. |
[18] | Haltiwanger J.C., , Jarmin R.S., and Miranda J., Who Creates Jobs? Small vs. Large vs. Young. National Bureau of Economic Research, Inc; (2010) . 16300. Available from: {https:// ideas.repec.org/p/nbr/nberwo/16300.html}. |
[19] | Miranda J., and Vilhuber L., Using Partially Synthetic Data to Replace Suppression in the Business Dynamics Statistics: Early Results, in: Privacy in Statistical Databases, J. Domingo-Ferrer, ed., vol. 8744: of Lecture Notes in Computer Science. Springer International Publishing}}; (2014) , pp. 232-242. Available from: {http://dx.doi.org/10.1007/978-3-319-11257-2_18}. |
[20] | Meng X.L., Multiple-imputation inferences with uncongenial sources of input, Statistical Sciences: 9: (4) ((1994) ), 538-573. |
[21] | Abowd J.M., , Stephens B.E., , Vilhuber L., , Andersson F., , McKinney K.L., , Roemer M. et al., The {LEHD} Infrastructure Files and the Creation of the {Q}uarterly {W}orkforce {I}ndicators, in: T. Dunne, J.B. Jensen, M.J. Roberts, eds, Producer Dynamics: New Evidence from Micro Data. University of Chicago Press, (2009) . |
[22] | Hyatt H., , McEntarfer E., , McKinney K., , Tibbets S., , Vilhuber L., and Walton D., Job-to-{J}ob {F}lows: {N}ew Statistics on \linebreak Worker Reallocation and Job Turnover, U.S. Census Bureau; (2015) . Available from: {http://lehd.ces.census.gov/doc/jobto job_documentation_long.pdf}. |
[23] | Drechsler J., and Reiter J.P., Sampling With Synthesis: A New Approach for Releasing Public Use Census Microdata, Journal of the American Statistical Association 105: (492) ((2010) ), 1347-1357. Available from: {http://ideas.repec.org/a/ bes/jnlasa/v105i492y2010p1347-1357.html}. |
[24] | Hu J., , Reiter J.P., and Wang Q., Dirichlet Process Mixture Models for Nested Categorical Data, ArXiv e-prints. (2014) Dec. |
[25] | Abowd J.M., , Stinson M., and Benedetto G., Final Report to the {Social Security Administration} on the {SIPP/SSA/IRS} {Public} {Use} {File} {Project}, U.S. Census Bureau; (2006) . Available from: {http://www2.vrdc.cornell.edu/news/?p=308}. |
[26] | Reiter J.P., , Oganian A., and Karr A.F., Verification servers: Enabling analysts to assess the quality of inferences from public use data, Computational Statistics & Data Analysis 53: (4) ((2009) ), 1475-1482. Available from: {http://dx.doi.org/ 10.1016/j.csda.2008.10.006}. |