Note: [1] The views expressed in this document are those of the author alone and do not necessarily represent those of Statistics Canada or the Government of Canada.
Abstract: Under the Modernization programme Statistics Canada has recently undertaken, the Agency is to put forward data access solutions that present greater analytical value to Canadians while maintaining its core values of protecting confidentiality of respondents’ information. One avenue currently explored is Data Synthesis as a means of delivering synthetic data with high analytical value to users. At the time of writing, Statistics Canada has publicly released synthetic versions of two different datasets related to census, mortality and cancer information. In both cases, synthetic data were generated using the R package synthpop. This paper describes the use of Data Synthesis as a proof of concept for modernizing Statistics Canada’s data access solutions.
Keywords: Synthetic data, data access, confidentiality, digital government, machine learning