Published on in Vol 22, No 11 (2020): November

Preprints (earlier versions) of this paper are available at, first published .
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation


  1. Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021;11(4):e043497 View
  2. Libbi C, Trienes J, Trieschnigg D, Seifert C. Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records. Future Internet 2021;13(5):136 View
  3. Zhang Z, Yan C, Malin B. Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation. Journal of the American Medical Informatics Association 2022;29(11):1890 View
  4. Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022;493:28 View
  5. Kamel Boulos M, Kwan M, El Emam K, Chung A, Gao S, Richardson D. Reconciling public health common good and individual privacy: new methods and issues in geoprivacy. International Journal of Health Geographics 2022;21(1) View
  6. Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney S, Malin B. A Multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 2022;13(1) View
  7. Thomas J, Foraker R, Zamstein N, Morrow J, Payne P, Wilcox A, Haendel M, Chute C, Gersing K, Walden A, Bennett T, Eichmann D, Guinney J, Kibbe W, Liu H, Pfaff E, Robinson P, Saltz J, Spratt H, Starren J, Suver C, Williams A, Wu C, Gabriel D, Hong S, Kostka K, Lehmann H, Moffitt R, Morris M, Palchuk M, Zhang X, Zhu R, Amor B, Bissell M, Clark M, Girvin A, Lee A, Miller R, Walters K, Chae Y, Cook C, Dest A, Dietz R, Dillon T, Francis P, Fuentes R, Graves A, McMurry J, Neumann A, O'Neil S, Sheikh U, Volz A, Zampino E, Austin C, Bozzette S, Deacy M, Garbarini N, Kurilla M, Michael S, Rutter J, Temple-O'Connor M, Bradwell K, Manna A, Qureshi N, Saltz M, Bramante C, Harper J, Hernandez W, Koraishy F, Mariona F, Mattapally S, Saha A, Vedula S, Fu Y, Mathews N, Mendelevitch O. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). Journal of the American Medical Informatics Association 2022;29(8):1350 View
  8. El Emam K, Mosquera L, Fang X, El-Hussuna A. Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study. JMIR Medical Informatics 2022;10(4):e35734 View
  9. Rodrigues M, Postolache O, Cercas F. Unobtrusive Cardio-Respiratory Assessment for Different Indoor Environmental Conditions. IEEE Sensors Journal 2022;22(23):23243 View
  10. Kuo N, Polizzotto M, Finfer S, Garcia F, Sönnerborg A, Zazzi M, Böhm M, Kaiser R, Jorm L, Barbieri S. The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific Data 2022;9(1) View
  11. Banerjee S, Bishop T. dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system. BMC Research Notes 2022;15(1) View
  12. Rajotte J, Bergen R, Buckeridge D, El Emam K, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience 2022;25(11):105331 View
  13. Girolamo T, Castro N, Hendricks A, Ghali S, Eigsti I. Implementation of Open Science Practices in Communication Sciences and Disorders Research With Black, Indigenous, and People of Color. Journal of Speech, Language, and Hearing Research 2023;66(6):2010 View
  14. Braddon A, Robinson S, Alati R, Betts K. Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology. Paediatric and Perinatal Epidemiology 2023;37(4):292 View
  15. Guillaudeux M, Rousseau O, Petot J, Bennis Z, Dein C, Goronflot T, Vince N, Limou S, Karakachoff M, Wargny M, Gourraud P. Patient-centric synthetic data generation, no reason to risk re-identification in biomedical data analysis. npj Digital Medicine 2023;6(1) View
  16. An S, Doan T, Lee J, Kim J, Kim Y, Kim Y, Yoon C, Jung S, Kim D, Kwon S, Kim H, Ahn J, Park C. A comparison of synthetic data approaches using utility and disclosure risk measures. Korean Journal of Applied Statistics 2023;36(2):141 View
  17. Flanagin A, Curfman G, Bibbins-Domingo K. The Growth of Medical Knowledge and Data Sharing—Reply. JAMA 2023;329(15):1315 View
  18. El Emam K. Status of Synthetic Data Generation for Structured Health Data. JCO Clinical Cancer Informatics 2023;(7) View
  19. Kuo N, Garcia F, Sönnerborg A, Böhm M, Kaiser R, Zazzi M, Polizzotto M, Jorm L, Barbieri S. Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV. Journal of Biomedical Informatics 2023;144:104436 View
  20. Ariana Kia E, Rahimi C, Mohammadi N. Comparing online group therapy based on emotional schema therapy with transdiagnostic therapy in improving distress tolerance and cognitive emotion regulation among university students with adjustment disorders due to romantic break‐ups. Counselling and Psychotherapy Research 2024;24(1):372 View
  21. Alloza C, Knox B, Raad H, Aguilà M, Coakley C, Mohrova Z, Boin É, Bénard M, Davies J, Jacquot E, Lecomte C, Fabre A, Batech M. A Case for Synthetic Data in Regulatory Decision‐Making in Europe. Clinical Pharmacology & Therapeutics 2023;114(4):795 View
  22. Svendsen V, Wijnen B, De Vos J, Veenstra R, Evers S, Lokkerbol J. A roadmap for applying machine learning when working with privacy-sensitive data: predicting non-response to treatment for eating disorders. Expert Review of Pharmacoeconomics & Outcomes Research 2023;23(8):933 View
  23. El Kababji S, Mitsakakis N, Fang X, Beltran-Bless A, Pond G, Vandermeer L, Radhakrishnan D, Mosquera L, Paterson A, Shepherd L, Chen B, Barlow W, Gralow J, Savard M, Clemons M, El Emam K. Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets. JCO Clinical Cancer Informatics 2023;(7) View

Books/Policy Documents

  1. Bullward A, Aljebreen A, Coles A, McInerney C, Johnson O. Process Mining Workshops. View