Evaluation of GPT-4’s Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

Li J, Dada A, Puladi B, Kleesiek J, Egger J. ChatGPT in healthcare: A taxonomy and systematic review. Computer Methods and Programs in Biomedicine 2024;245:108013 View
Wu Q, Wu Q, Li H, Wang Y, Bai Y, Wu Y, Yu X, Li X, Dong P, Xue J, Shen D, Wang M. Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study. JMIR Medical Informatics 2024;12:e55799 View
Shiraishi M, Miyamoto S, Takeishi H, Kurita D, Furuse K, Ohba J, Moriwaki Y, Fujisawa K, Okazaki M. The Potential of Chat-Based Artificial Intelligence Models in Differentiating Between Keloid and Hypertrophic Scars: A Pilot Study. Aesthetic Plastic Surgery 2024;48(24):5367 View
Mukherjee P, Hou B, Suri A, Zhuang Y, Parnell C, Lee N, Stroie O, Jain R, Wang K, Sharma K, Summers R. Evaluation of GPT Large Language Model Performance on RSNA 2023 Case of the Day Questions. Radiology 2024;313(1) View
Chang Y, Yin J, Li J, Liu C, Cao L, Lin S. Applications and Future Prospects of Medical LLMs: A Survey Based on the M-KAT Conceptual Framework. Journal of Medical Systems 2024;48(1) View
Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Application of large language models in disease diagnosis and treatment. Chinese Medical Journal 2025;138(2):130 View
Su Y, Yang S, Liu Y, Kai A, Chen L, Liu M. Knowledge discovery from porous organic cage literature using a large language model. Digital Discovery 2025;4(2):403 View
Altalla’ B, Ahmad A, Bitar L, Al-Bssol M, Al Omari A, Sultan I, Sarkar S. Radiology Report Annotation Using Generative Large Language Models: Comparative Analysis. International Journal of Biomedical Imaging 2025;2025(1) View
Zhou Z, Qin P, Cheng X, Shao M, Ren Z, Zhao Y, Li Q, Liu L. ChatGPT in Oncology Diagnosis and Treatment: Applications, Legal and Ethical Challenges. Current Oncology Reports 2025;27(4):336 View
Shmilovitch A, Katson M, Cohen-Shelly M, Peretz S, Aran D, Shelly S. GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study. JMIR AI 2025;4:e60391 View
Jin G. Artificial intelligence in thoracic imaging—a new paradigm for diagnosing pulmonary diseases: a narrative review. Journal of the Korean Medical Association 2025;68(5):288 View
Kim S, Schramm S, Wihl J, Raffler P, Tahedl M, Canisius J, Luiken I, Endrös L, Reischl S, Marka A, Walter R, Schillmaier M, Zimmer C, Wiestler B, Hedderich D. Boosting LLM-assisted diagnosis: 10-minute LLM tutorial elevates radiology residents’ performance in brain MRI interpretation. Neuroradiology 2025;67(8):2069 View
Wihl J, Rosenkranz E, Schramm S, Berberich C, Griessmair M, Woźnicki P, Pinto F, Ziegelmayer S, Adams L, Bressem K, Kirschke J, Zimmer C, Wiestler B, Hedderich D, Kim S. Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines. European Radiology Experimental 2025;9(1) View
Hürsoy N, Kolluk H, Solak M, Budak K, Kaba E. Interpreting Chest X-ray with ChatGPT: Can It Serve as a Tool for Justifying Computed Tomography?. CERASUS JOURNAL OF MEDICINE 2025;2(2):118 View
de Almeida J, Alberich L, Tsakou G, Marias K, Tsiknakis M, Lekadir K, Marti-Bonmati L, Papanikolaou N. Foundation models for radiology—the position of the AI for Health Imaging (AI4HI) network. Insights into Imaging 2025;16(1) View
Han M, Liu Y. Evaluating generative artificial intelligence products using fuzzy social network multi-attribute decision-making model: User perspective. Applied Soft Computing 2025;183:113715 View
Chetla N, Samayamanthula S, Chang J, Leigh A, Akosman S, Tandon M, Hage T, Cusick M. Assessing the Diagnostic Capabilities of ChatGPT-4 Omni in Grading Diabetic Retinopathy Fundoscopy Using Color Fundus Photographs. Clinical Ophthalmology 2025;Volume 19:3103 View
Mavrych V, Yousef E, Yaqinuddin A, Shaikh A, Bolgova O. Evaluating the Reliability of GPT‐4o in Histological Image Interpretation. Clinical Anatomy 2026;39(4):517 View
Kottlors J, Iuga A, Bluethgen C, Bressem K, Kather J, Moy L, Wald C, Wang W, Liu T, Ranschaert E, Dratsch T, Kleesiek J, Gertz R, Rajpurkar P, Bedayat A, Fink M, Zeeck A, Chaudhari A, Alkasab T, Wu H, Nensa F, Wang B, Große Hokamp N, Laukamp K, Persigehl T, Maintz D, Truhn D, Lennartz S. Guidelines for Reporting Studies on Large Language Models in Radiology: An International Delphi Expert Survey. Radiology 2026;318(2) View
Li J, Zhou Z, Wang Z, Lv H. Prioritizing human-AI collaboration in healthcare: the TRIAD framework for trustworthy governance, real-world, and integrated adaptive deployment. Military Medical Research 2026;12(1) View
Schramm S, Le Guellec B, Topka M, Svec M, Backhaus P, Eisenkolb V, Riedel E, Beyrle M, Platzek P, Ramschütz C, Paprottka K, Renz M, Bodden J, Kirschke J, Ziegelmayer S, Busch F, Makowski M, Adams L, Bressem K, Hedderich D, Wiestler B, Kim S. Performing Best When Needed Least: Reader Experience Shapes Accuracy Gains in Large Language Model–assisted Brain MRI Differential Diagnosis. Radiology 2026;319(2) View
Matsuo H, Nishio M, Fujimoto K, Deperrois N, Matsunaga T, Nooralahzadeh F, Krauthammer M, Murakami T. Artificial intelligence for chest radiography: an overview of techniques, challenges, and future directions. npj Health Systems 2026;3(1) View
Grach S, Badawi A, Ahmad F, Dolatabadi E. A scoping review of algorithmic equity, data diversity, and inclusive design in the transformer era of clinical NLP. Journal of Biomedical Informatics 2026;181:105077 View

Books/Policy Documents

Huang W, Yuan J, Wang Q, Ye Z, Fazi C, Fontanella F, Hernandez-Cruz N. Advanced Intelligent Computing Technology and Applications. View

Conference Proceedings

Mahmood R, Yan P, Reyes D, Wang G, Kalra M, Kaviani P, Wu J, Syeda-Mahmood T. 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). Evaluating Automated Radiology Report Quality Through Fine-Grained Phrasal Grounding of Clinical Findings View

Citation

Please cite as:

Ziegelmayer S, Marka AW, Lenhart N, Nehls N, Reischl S, Harder F, Sauter A, Makowski M, Graf M, Gawlitza J
Evaluation of GPT-4’s Chest X-Ray Impression Generation: A Reader Study on Performance and Perception
J Med Internet Res 2023;25:e50865
doi: 10.2196/50865 PMID: 38133918 PMCID: 10770784

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Research Letter (240) Research Instruments, Questionnaires, and Tools (1176) Chatbots and Conversational Agents (1147) Artificial Intelligence (4609) Generative Language Models Including ChatGPT (1446)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn