Published on in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/53724, first published .
Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Journals

  1. Shmilovitch A, Katson M, Cohen-Shelly M, Peretz S, Aran D, Shelly S. GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study. JMIR AI 2025;4:e60391 View
  2. Takita H, Kabata D, Walston S, Tatekawa H, Saito K, Tsujimoto Y, Miki Y, Ueda D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025;8(1) View
  3. Lafourcade C, Kérourédan O, Ballester B, Richert R. Accuracy, consistency, and contextual understanding of large language models in restorative dentistry and endodontics. Journal of Dentistry 2025;157:105764 View
  4. Naliyatthaliyazchayil P, Muthyala R, Gichoya J, Purkayastha S. Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach. Journal of Medical Internet Research 2025;27:e74142 View
  5. Derbal Y. Generative AI - Assisted Adaptive Cancer Therapy. Cancer Control 2025;32 View
  6. Cheah B, Vicente C, Chan K. Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis. Viruses 2025;17(7):882 View
  7. Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
  8. Li Q, Liu H, Guo C, Gao C, Chen D, Wang M, Gao F, van Harmelen F, Gu J. Reviewing clinical knowledge in medical large language models: Training and beyond. Knowledge-Based Systems 2025;328:114215 View
  9. Luo P, Fan C, Li A, Jiang T, Jiang A, Qi C, Gan W, Zhu L, Mou W, Zeng D, Tang B, Xiao M, Chu G, Liang Z, Shen J, Liu Z, Wei T, Cheng Q, Lin A, Chen X. Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study. International Journal of Surgery 2025;111(8):5071 View
  10. Sarvari P, Al-fagih Z. Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method. JMIRx Med 2025;6:e67661 View
  11. Yang Y, Jin Q, Huang F, Lu Z. Adversarial prompt and fine-tuning attacks threaten medical large language models. Nature Communications 2025;16(1) View