Published on in Vol 22, No 11 (2020): November

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/23139, first published .
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation

Authors of this article:

Khaled El Emam1, 2, 3 Author Orcid Image ;   Lucy Mosquera3 Author Orcid Image ;   Jason Bass3 Author Orcid Image

Journals

  1. Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021;11(4):e043497 View
  2. Libbi C, Trienes J, Trieschnigg D, Seifert C. Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records. Future Internet 2021;13(5):136 View
  3. Zhang Z, Yan C, Malin B. Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation. Journal of the American Medical Informatics Association 2022;29(11):1890 View
  4. Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022;493:28 View
  5. Kamel Boulos M, Kwan M, El Emam K, Chung A, Gao S, Richardson D. Reconciling public health common good and individual privacy: new methods and issues in geoprivacy. International Journal of Health Geographics 2022;21(1) View
  6. Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney S, Malin B. A Multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 2022;13(1) View
  7. Thomas J, Foraker R, Zamstein N, Morrow J, Payne P, Wilcox A, Haendel M, Chute C, Gersing K, Walden A, Bennett T, Eichmann D, Guinney J, Kibbe W, Liu H, Pfaff E, Robinson P, Saltz J, Spratt H, Starren J, Suver C, Williams A, Wu C, Gabriel D, Hong S, Kostka K, Lehmann H, Moffitt R, Morris M, Palchuk M, Zhang X, Zhu R, Amor B, Bissell M, Clark M, Girvin A, Lee A, Miller R, Walters K, Chae Y, Cook C, Dest A, Dietz R, Dillon T, Francis P, Fuentes R, Graves A, McMurry J, Neumann A, O'Neil S, Sheikh U, Volz A, Zampino E, Austin C, Bozzette S, Deacy M, Garbarini N, Kurilla M, Michael S, Rutter J, Temple-O'Connor M, Bradwell K, Manna A, Qureshi N, Saltz M, Bramante C, Harper J, Hernandez W, Koraishy F, Mariona F, Mattapally S, Saha A, Vedula S, Fu Y, Mathews N, Mendelevitch O. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). Journal of the American Medical Informatics Association 2022;29(8):1350 View
  8. El Emam K, Mosquera L, Fang X, El-Hussuna A. Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study. JMIR Medical Informatics 2022;10(4):e35734 View
  9. Rodrigues M, Postolache O, Cercas F. Unobtrusive Cardio-Respiratory Assessment for Different Indoor Environmental Conditions. IEEE Sensors Journal 2022;22(23):23243 View
  10. Kuo N, Polizzotto M, Finfer S, Garcia F, Sönnerborg A, Zazzi M, Böhm M, Kaiser R, Jorm L, Barbieri S. The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific Data 2022;9(1) View
  11. Banerjee S, Bishop T. dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system. BMC Research Notes 2022;15(1) View
  12. Rajotte J, Bergen R, Buckeridge D, El Emam K, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience 2022;25(11):105331 View
  13. Girolamo T, Castro N, Hendricks A, Ghali S, Eigsti I. Implementation of Open Science Practices in Communication Sciences and Disorders Research With Black, Indigenous, and People of Color. Journal of Speech, Language, and Hearing Research 2023;66(6):2010 View
  14. Braddon A, Robinson S, Alati R, Betts K. Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology. Paediatric and Perinatal Epidemiology 2023;37(4):292 View
  15. Guillaudeux M, Rousseau O, Petot J, Bennis Z, Dein C, Goronflot T, Vince N, Limou S, Karakachoff M, Wargny M, Gourraud P. Patient-centric synthetic data generation, no reason to risk re-identification in biomedical data analysis. npj Digital Medicine 2023;6(1) View
  16. An S, Doan T, Lee J, Kim J, Kim Y, Kim Y, Yoon C, Jung S, Kim D, Kwon S, Kim H, Ahn J, Park C. A comparison of synthetic data approaches using utility and disclosure risk measures. Korean Journal of Applied Statistics 2023;36(2):141 View
  17. Flanagin A, Curfman G, Bibbins-Domingo K. The Growth of Medical Knowledge and Data Sharing—Reply. JAMA 2023;329(15):1315 View
  18. El Emam K. Status of Synthetic Data Generation for Structured Health Data. JCO Clinical Cancer Informatics 2023;(7) View
  19. Kuo N, Garcia F, Sönnerborg A, Böhm M, Kaiser R, Zazzi M, Polizzotto M, Jorm L, Barbieri S. Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV. Journal of Biomedical Informatics 2023;144:104436 View
  20. Ariana Kia E, Rahimi C, Mohammadi N. Comparing online group therapy based on emotional schema therapy with transdiagnostic therapy in improving distress tolerance and cognitive emotion regulation among university students with adjustment disorders due to romantic break‐ups. Counselling and Psychotherapy Research 2024;24(1):372 View
  21. Alloza C, Knox B, Raad H, Aguilà M, Coakley C, Mohrova Z, Boin É, Bénard M, Davies J, Jacquot E, Lecomte C, Fabre A, Batech M. A Case for Synthetic Data in Regulatory Decision‐Making in Europe. Clinical Pharmacology & Therapeutics 2023;114(4):795 View
  22. Svendsen V, Wijnen B, De Vos J, Veenstra R, Evers S, Lokkerbol J. A roadmap for applying machine learning when working with privacy-sensitive data: predicting non-response to treatment for eating disorders. Expert Review of Pharmacoeconomics & Outcomes Research 2023;23(8):933 View
  23. El Kababji S, Mitsakakis N, Fang X, Beltran-Bless A, Pond G, Vandermeer L, Radhakrishnan D, Mosquera L, Paterson A, Shepherd L, Chen B, Barlow W, Gralow J, Savard M, Clemons M, El Emam K. Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets. JCO Clinical Cancer Informatics 2023;(7) View
  24. Yan C, Zhang Z, Nyemba S, Li Z. Generating Synthetic Electronic Health Record Data Using Generative Adversarial Networks: Tutorial. JMIR AI 2024;3:e52615 View
  25. Zuiderwijk A. Researchers’ Willingness and Ability to Openly Share Their Research Data: A Survey of COVID-19 Pandemic-Related Factors. Sage Open 2024;14(1) View
  26. Scroggins J, Topaz M, Song J, Zolnoori M. Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient–nurse verbal communications in home healthcare settings?. Journal of Nursing Scholarship 2024 View
  27. Budu E, Etminani K, Soliman A, Rögnvaldsson T. Evaluation of synthetic electronic health records: A systematic review and experimental assessment. Neurocomputing 2024;603:128253 View
  28. Fadel M, Petot J, Gourraud P, Descatha A, Ali M. Flexibility of a large blindly synthetized avatar database for occupational research: Example from the CONSTANCES cohort for stroke and knee pain. PLOS ONE 2024;19(7):e0308063 View
  29. Kim J, Choo H, Shin S, Song K. Synthesis and quality assessment of combined time-series and static medical data using a real-world time-series generative adversarial network. Scientific Reports 2024;14(1) View

Books/Policy Documents

  1. Bullward A, Aljebreen A, Coles A, McInerney C, Johnson O. Process Mining Workshops. View
  2. Trindade C, Antunes L, Carvalho T, Moniz N. Privacy in Statistical Databases. View