The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance

doi:10.2196/67409

Journals

Menz B, Modi N, Abuhelwa A, Ruanglertboon W, Vitry A, Gao Y, Li L, Chhetri R, Chu B, Bacchi S, Kichenadasse G, Shahnam A, Rowland A, Sorich M, Hopkins A. Generative AI chatbots for reliable cancer information: Evaluating web-search, multilingual, and reference capabilities of emerging large language models. European Journal of Cancer 2025;218:115274 View
Gao C, Satheakeerthy S, Guo C, Pradhan A, Booth A, Chan W, Kanjilal S, Roberts M, Kotton C, Bacchi S. Large language models for infectious diseases require evidence generation and regulation. Internal Medicine Journal 2025;55(7):1198 View
Modi N, Menz B, Awaty A, Alex C, Logan J, McKinnon R, Rowland A, Bacchi S, Gradon K, Sorich M, Hopkins A. Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots. Annals of Internal Medicine 2025;178(8):1172 View
Alomari L, Alshammari M, Arbaeen A, Alshehri R, Almalki H. Safety and accuracy of AI in triaging patients in the emergency department. International Journal of Emergency Medicine 2025;18(1) View
Wang X, Wang Q, Ding G, Wang J, Tang Y, Feng Y. Artificial intelligence in multidisciplinary tumor boards enhancing decision making and clinical outcomes in oncology. iScience 2025;28(12):114082 View
Menz B, Scarfo N, Modi N, Cornelisse E, Li L, Tan J, Gandhi J, Maher D, Kousa D, Daniel K, Menon V, Bacchi S, McKinnon R, Wiese M, Rowland A, Sorich M, Hopkins A. Vision-Enabled AI scribes reduce omissions in clinical conversations: evidence from simulated medication histories. npj Digital Medicine 2026 View
Chu B, Modi N, Menz B, Cornelisse E, Bacchi S, Bulamu N, Ullah S, McKinnon R, Gradon K, Rowland A, Sorich M, Hopkins A. Evaluation of Generative Artificial Intelligence Safeguards Against the Creation of Images and Videos Harmful to Public Health. Public Health Reports® 2026 View

Conference Proceedings

Hassanein F, El-Guindy J, Ahmed Y, Abou-Bakr A. 2025 Twelfth International Conference on Intelligent Computing and Information Systems (ICICIS). Evaluating Multimodal Large Language Models for Clinical Diagnosis of Oral Lesions: A Biomedical Informatics Perspective View

This paper is in the following e-collection/theme issue:

The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance

The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance

Journals

Conference Proceedings