@Article{info:doi/10.2196/23139, author="El Emam, Khaled and Mosquera, Lucy and Bass, Jason", title="Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation", journal="J Med Internet Res", year="2020", month="Nov", day="16", volume="22", number="11", pages="e23139", keywords="synthetic data; privacy; data sharing; data access; de-identification; open data", abstract="Background: There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them. Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data. Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this ``meaningful identity disclosure risk.'' The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data. Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively. Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data. ", issn="1438-8871", doi="10.2196/23139", url="http://www.jmir.org/2020/11/e23139/", url="https://doi.org/10.2196/23139", url="http://www.ncbi.nlm.nih.gov/pubmed/33196453" }