Affiliations: [a] Department of CSE, UCE, BIT Campus, Anna University, Trichy, Tamil Nadu, India | [b] Department of CSE, M.A.M College of Engineering & Technology, Trichy, Tamil Nadu, India
Abstract: The fast advancement of information technology has resulted in more efficient information storage and retrieval. As a result, most organizations, businesses, and governments are releasing and exchanging a large amount of micro data among themselves for commercial or research purposes. However, incorrect data exchange will result in privacy breaches. Many methods and strategies have been developed to address privacy breaches, and Anonymization is one of them that many companies use. In order to perform anonymization, identification of the Quasi Identifier (QI) is significant. Hence this paper proposes a method called Quasi Identification Based on Tree (QIBT) for automatic QI identification. The proposed method derives the QI, based on the relationship between the numbers of distinct values assumed by the set of attributes. So, it uses the tree data structure to derive the unique and infrequent attribute values from the entire dataset with less computational cost. The proposed method consists of four phases: (i) Unique attribute value computation (ii) Tree construction and (iii) Computation of quasi-identifier from the tree (iv) Applying Anonymization Technique to the identified QI. Attributes with high risk of disclosure are identified using our proposed algorithm. Synthetic data are created exclusively for the detected QI using a partial synthetic data generating technique to improve usefulness. The suggested method’s efficiency is tested with a subset of the UCI machine learning dataset and produces superior results when compared to other current approaches.
Keywords: Quasi identifier, sensitive attribute, data privacy, anonymization, tree, synthetic data