Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

doi:10.2196/53724

Published on 13.May.2024 in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/53724, first published 17.Oct.2023.

Doctor using tablet with AI robot and medical interface

Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Wan Hang Keith Chiu¹

; Wei Sum Koel Ko¹

; William Chi Shing Cho²

; Sin Yu Joanne Hui³

; Wing Chi Lawrence Chan⁴

; Michael D Kuo⁵

Article Authors Cited by (14) Tweetations (4) Metrics

Journals

Shmilovitch A, Katson M, Cohen-Shelly M, Peretz S, Aran D, Shelly S. GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study. JMIR AI 2025;4:e60391 View
Takita H, Kabata D, Walston S, Tatekawa H, Saito K, Tsujimoto Y, Miki Y, Ueda D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025;8(1) View
Lafourcade C, Kérourédan O, Ballester B, Richert R. Accuracy, consistency, and contextual understanding of large language models in restorative dentistry and endodontics. Journal of Dentistry 2025;157:105764 View
Naliyatthaliyazchayil P, Muthyala R, Gichoya J, Purkayastha S. Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach. Journal of Medical Internet Research 2025;27:e74142 View
Derbal Y. Generative AI - Assisted Adaptive Cancer Therapy. Cancer Control 2025;32 View
Cheah B, Vicente C, Chan K. Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis. Viruses 2025;17(7):882 View
Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
Li Q, Liu H, Guo C, Gao C, Chen D, Wang M, Gao F, van Harmelen F, Gu J. Reviewing clinical knowledge in medical large language models: Training and beyond. Knowledge-Based Systems 2025;328:114215 View
Luo P, Fan C, Li A, Jiang T, Jiang A, Qi C, Gan W, Zhu L, Mou W, Zeng D, Tang B, Xiao M, Chu G, Liang Z, Shen J, Liu Z, Wei T, Cheng Q, Lin A, Chen X. Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study. International Journal of Surgery 2025;111(8):5071 View
Sarvari P, Al-fagih Z. Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method. JMIRx Med 2025;6:e67661 View
Yang Y, Jin Q, Huang F, Lu Z. Adversarial prompt and fine-tuning attacks threaten medical large language models. Nature Communications 2025;16(1) View
Reese J, Chimirri L, Bridges Y, Danis D, Caufield J, Gargano M, Kroll C, Schmeder A, Liu F, Wissink K, McMurry J, Graefe A, Niyonkuru E, Korn D, Casiraghi E, Valentini G, Jacobsen J, Haendel M, Smedley D, Mungall C, Robinson P. Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools. European Journal of Human Genetics 2026;34(4):498 View
Ebrahimi Warkiani M, Safari B, Moattar M, Hosseinzadeh M. A systematic review of multimodal large language models for medical decision support: Architectures, fusion methods, and real-world applications. Applied Soft Computing 2026;196:115051 View

Conference Proceedings

Raputri E, Teguh A, Hidayah S, Anom A, Setiawan F, Qomariyah N. 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). Retrieval-Augmented LLMs with Indonesian Clinical Trials Guidelines: A Comparative Study View

This paper is in the following e-collection/theme issue:

Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases

Journals

Conference Proceedings