%0 Journal Article
%@ 1438-8871
%I JMIR Publications
%V 27
%N 
%P e67883
%T Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study
%A Wei,Bin
%A Yao,Lili
%A Hu,Xin
%A Hu,Yuxiang
%A Rao,Jie
%A Ji,Yu
%A Dong,Zhuoer
%A Duan,Yichong
%A Wu,Xiaorong
%+ Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, No.17 Yongwai Zheng Street, Donghu District, Jiangxi Province, Nanchang, 330000, China, 86 136117093259, wxr98021@126.com
%K LLM
%K large language models
%K ocular myasthenia gravis
%K patient education
%K China
%K effectiveness
%K deep learning
%K artificial intelligence
%K health care
%K accuracy
%K applicability
%K neuromuscular disorder
%K extraocular muscles
%K ptosis
%K diplopia
%K ophthalmology
%K ChatGPT
%K clinical practice
%K digital health
%D 2025
%7 10.4.2025
%9 Original Paper
%J J Med Internet Res
%G English
%X Background: Ocular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients’ access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain. Objective: The purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients’ ratings of their usability and readability were analyzed. Methods: The study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master’s students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains. Results: ChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master’s students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5’s responses were slightly more readable (4.31 vs 4.03, P=.01). Conclusions: LLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice. 
%R 10.2196/67883
%U https://www.jmir.org/2025/1/e67883
%U https://doi.org/10.2196/67883