TY - JOUR AU - Quon, Stephanie AU - Zhou, Sarah PY - 2025/4/11 TI - Enhancing AI-Driven Medical Translations: Considerations for Language Concordance JO - JMIR Med Educ SP - e70420 VL - 11 KW - letter to the editor KW - ChatGPT KW - AI KW - artificial intelligence KW - language KW - translation KW - health care disparity KW - natural language model KW - survey KW - patient education KW - accessibility KW - preference KW - human language KW - communication KW - language-concordant care UR - https://mededu.jmir.org/2025/1/e70420 UR - http://dx.doi.org/10.2196/70420 ID - info:doi/10.2196/70420 ER - TY - JOUR AU - Teng, Joyce AU - Novoa, Andres Roberto AU - Aleshin, Alexandrovna Maria AU - Lester, Jenna AU - Seiger, Kira AU - Dzuali, Fiatsogbe AU - Daneshjou, Roxana PY - 2025/4/11 TI - Authors? Reply: Enhancing AI-Driven Medical Translations: Considerations for Language Concordance JO - JMIR Med Educ SP - e71721 VL - 11 KW - ChatGPT KW - artificial intelligence KW - language KW - translation KW - health care disparity KW - natural language model KW - survey KW - patient education KW - accessibility KW - preference KW - human language KW - communication KW - language-concordant care UR - https://mededu.jmir.org/2025/1/e71721 UR - http://dx.doi.org/10.2196/71721 ID - info:doi/10.2196/71721 ER - TY - JOUR AU - Six, Stephanie AU - Schlesener, Elizabeth AU - Hill, Victoria AU - Babu, V. Sabarish AU - Byrne, Kaileigh PY - 2025/4/11 TI - Impact of Conversational and Animation Features of a Mental Health App Virtual Agent on Depressive Symptoms and User Experience Among College Students: Randomized Controlled Trial JO - JMIR Ment Health SP - e67381 VL - 12 KW - depression KW - mental health app KW - virtual agents KW - cognitive behavioral therapy KW - conversational agents KW - virtual agent KW - animations KW - college student KW - CBT KW - ANOVA KW - randomized controlled trial KW - depressive symptoms KW - mental disorder KW - mental illness KW - user experience KW - mHealth KW - digital health N2 - Background: Numerous mental health apps purport to alleviate depressive symptoms. Strong evidence suggests that brief cognitive behavioral therapy (bCBT)-based mental health apps can decrease depressive symptoms, yet there is limited research elucidating the specific features that may augment its therapeutic benefits. One potential design feature that may influence effectiveness and user experience is the inclusion of virtual agents that can mimic realistic, human face-to-face interactions. Objective: The goal of the current experiment was to determine the effect of conversational and animation features of a virtual agent within a bCBT-based mental health app on depressive symptoms and user experience in college students with and without depressive symptoms. Methods: College students (N=209) completed a 2-week intervention in which they engaged with a bCBT-based mental health app with a customizable therapeutic virtual agent that varied in conversational and animation features. A 2 (time: baseline vs 2-week follow-up) × 2 (conversational vs non-conversational agent) × 2 (animated vs non-animated agent) randomized controlled trial was used to assess mental health symptoms (Patient Health Questionnaire-8, Perceived Stress Scale-10, and Response Rumination Scale questionnaires) and user experience (mHealth App Usability Questionnaire, MAUQ) in college students with and without current depressive symptoms. The mental health app usability and qualitative questions regarding users? perceptions of their therapeutic virtual agent interactions and customization process were assessed at follow-up. Results: Mixed ANOVA (analysis of variance) results demonstrated a significant decrease in symptoms of depression (P=.002; mean [SD]=5.5 [4.86] at follow-up vs mean [SD]=6.35 [4.71] at baseline), stress (P=.005; mean [SD]=15.91 [7.67] at follow-up vs mean [SD]=17.02 [6.81] at baseline), and rumination (P=.03; mean [SD]=40.42 [12.96] at follow-up vs mean [SD]=41.92 [13.61] at baseline); however, no significant effect of conversation or animation was observed. Findings also indicate a significant increase in user experience in animated conditions. This significant increase in animated conditions is also reflected in the user?s ease of use and satisfaction (F(1, 201)=102.60, P<.001), system information arrangement (F(1, 201)=123.12, P<.001), and usefulness of the application (F(1, 201)=3667.62, P<.001). Conclusions: The current experiment provides support for bCBT-based mental health apps featuring customizable, humanlike therapeutic virtual agents and their ability to significantly reduce negative symptomology over a brief timeframe. The app intervention reduced mental health symptoms, regardless of whether the agent included conversational or animation features, but animation features enhanced the user experience. These effects were observed in both users with and without depressive symptoms. Trial Registration: Open Science Framework B2HX5; https://doi.org/10.17605/OSF.IO/B2HX5 UR - https://mental.jmir.org/2025/1/e67381 UR - http://dx.doi.org/10.2196/67381 ID - info:doi/10.2196/67381 ER - TY - JOUR AU - Socrates, Vimig AU - Wright, S. Donald AU - Huang, Thomas AU - Fereydooni, Soraya AU - Dien, Christine AU - Chi, Ling AU - Albano, Jesse AU - Patterson, Brian AU - Sasidhar Kanaparthy, Naga AU - Wright, X. Catherine AU - Loza, Andrew AU - Chartash, David AU - Iscoe, Mark AU - Taylor, Andrew Richard PY - 2025/4/11 TI - Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study JO - JMIR Aging SP - e69504 VL - 8 KW - deprescribing KW - large language models KW - geriatrics KW - potentially inappropriate medication list KW - emergency medicine KW - natural language processing KW - calibration N2 - Background: Polypharmacy, the concurrent use of multiple medications, is prevalent among older adults and associated with increased risks for adverse drug events including falls. Deprescribing, the systematic process of discontinuing potentially inappropriate medications, aims to mitigate these risks. However, the practical application of deprescribing criteria in emergency settings remains limited due to time constraints and criteria complexity. Objective: This study aims to evaluate the performance of a large language model (LLM)?based pipeline in identifying deprescribing opportunities for older emergency department (ED) patients with polypharmacy, using 3 different sets of criteria: Beers, Screening Tool of Older People?s Prescriptions, and Geriatric Emergency Medication Safety Recommendations. The study further evaluates LLM confidence calibration and its ability to improve recommendation performance. Methods: We conducted a retrospective cohort study of older adults presenting to an ED in a large academic medical center in the Northeast United States from January 2022 to March 2022. A random sample of 100 patients (712 total oral medications) was selected for detailed analysis. The LLM pipeline consisted of two steps: (1) filtering high-yield deprescribing criteria based on patients? medication lists, and (2) applying these criteria using both structured and unstructured patient data to recommend deprescribing. Model performance was assessed by comparing model recommendations to those of trained medical students, with discrepancies adjudicated by board-certified ED physicians. Selective prediction, a method that allows a model to abstain from low-confidence predictions to improve overall reliability, was applied to assess the model?s confidence and decision-making thresholds. Results: The LLM was significantly more effective in identifying deprescribing criteria (positive predictive value: 0.83; negative predictive value: 0.93; McNemar test for paired proportions: ?21=5.985; P=.02) relative to medical students, but showed limitations in making specific deprescribing recommendations (positive predictive value=0.47; negative predictive value=0.93). Adjudication revealed that while the model excelled at identifying when there was a deprescribing criterion related to one of the patient?s medications, it often struggled with determining whether that criterion applied to the specific case due to complex inclusion and exclusion criteria (54.5% of errors) and ambiguous clinical contexts (eg, missing information; 39.3% of errors). Selective prediction only marginally improved LLM performance due to poorly calibrated confidence estimates. Conclusions: This study highlights the potential of LLMs to support deprescribing decisions in the ED by effectively filtering relevant criteria. However, challenges remain in applying these criteria to complex clinical scenarios, as the LLM demonstrated poor performance on more intricate decision-making tasks, with its reported confidence often failing to align with its actual success in these cases. The findings underscore the need for clearer deprescribing guidelines, improved LLM calibration for real-world use, and better integration of human?artificial intelligence workflows to balance artificial intelligence recommendations with clinician judgment. UR - https://aging.jmir.org/2025/1/e69504 UR - http://dx.doi.org/10.2196/69504 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/69504 ER - TY - JOUR AU - Bolgova, Olena AU - Shypilova, Inna AU - Mavrych, Volodymyr PY - 2025/4/10 TI - Large Language Models in Biochemistry Education: Comparative Evaluation of Performance JO - JMIR Med Educ SP - e67244 VL - 11 KW - ChatGPT KW - Claude KW - Gemini KW - Copilot KW - biochemistry KW - LLM KW - medical education KW - artificial intelligence KW - NLP KW - natural language processing KW - machine learning KW - large language model KW - AI KW - ML KW - comprehensive analysis KW - medical students KW - GPT-4 KW - questionnaire KW - medical course KW - bioenergetics N2 - Background: Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation. Objective: The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots?Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)?against the academic results of medical students in the medical biochemistry course. Methods: We used 200 USMLE (United States Medical Licensing Examination)?style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4?1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data?s basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05. Results: On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students? performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04). Conclusions: Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. UR - https://mededu.jmir.org/2025/1/e67244 UR - http://dx.doi.org/10.2196/67244 ID - info:doi/10.2196/67244 ER - TY - JOUR AU - Abdullah, Nailah Nik AU - Tang, Jia AU - Fetrati, Hemad AU - Kaukiah, Binti Nor Fadhilah AU - Saharudin, Bin Sahrin AU - Yong, Sim Vee AU - Yen, How Chia PY - 2025/4/10 TI - MARIA (Medical Assistance and Rehabilitation Intelligent Agent) for Medication Adherence in Patients With Heart Failure: Empirical Results From a Wizard of Oz Systematic Conversational Agent Design Clinical Protocol JO - JMIR Cardio SP - e55846 VL - 9 KW - heart failure KW - medication adherence KW - self-monitoring KW - chatbot KW - conversational agent KW - Wizard of Oz KW - digital health N2 - Background: Nonadherence to medication is a key factor contributing to high heart failure (HF) rehospitalization rates. A conversational agent (CA) or chatbot is a technology that can enhance medication adherence by helping patients self-manage their medication routines at home. Objective: This study outlines the conception of a design method for developing a CA to support patients in medication adherence, utilizing design thinking as the primary process for gathering requirements, prototyping, and testing. We apply this design method to the ongoing development of Medical Assistance and Rehabilitation Intelligent Agent (MARIA), a rule-based CA. Methods: Following the design thinking process, at the ideation stage, we engaged a multidisciplinary group of stakeholders (patients and pharmacists) to elicit requirements for the early conception of MARIA. In collaboration with pharmacists, we structured MARIA?s dialogue into a workflow based on Adlerian therapy, a psychoeducational theory. At the testing stage, we conducted an observational study using the Wizard of Oz (WoZ) research method to simulate the MARIA prototype with 20 patient participants. This approach validated and refined our application of Adlerian therapy in the CA?s dialogue. We incorporated human-likeness and trust scoring into user satisfaction assessments after each WoZ session to evaluate MARIA?s feasibility and acceptance of medication adherence. Dialogue data collected through WoZ simulations were analyzed using a coding analysis technique. Results: Our design method for the CA revealed gaps in MARIA?s conception, including (1) handling negative responses, (2) appropriate use of emoticons to enhance human-likeness, (3) system feedback mechanisms during turn-taking delays, and (4) defining the extent to which a CA can communicate on behalf of a health care provider regarding medication adherence. Conclusions: The design thinking process provided interactive steps to involve users early in the development of a CA. Notably, the use of WoZ in an observational clinical protocol highlighted the following: (1) coding analysis offered guidelines for modeling CA dialogue with patient safety in mind; (2) incorporating human-likeness and trust in user satisfaction assessments provided insights into attributes that foster patient trust in a CA; and (3) the application of Adlerian therapy demonstrated its effectiveness in motivating patients with HF to adhere to medication within a CA framework. In conclusion, our method is valuable for modeling and validating CA interactions with patients, assessing system reliability, user expectations, and constraints. It can guide designers in leveraging existing CA technologies, such as ChatGPT or AWS Lex, for adaptation in health care settings. UR - https://cardio.jmir.org/2025/1/e55846 UR - http://dx.doi.org/10.2196/55846 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55846 ER - TY - JOUR AU - Lee, Seonmi AU - Jeong, Jaehyun AU - Kim, Myungsung AU - Lee, Sangil AU - Kim, Sung-Phil AU - Jung, Dooyoung PY - 2025/4/10 TI - Development of a Mobile Intervention for Procrastination Augmented With a Semigenerative Chatbot for University Students: Pilot Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e53133 VL - 13 KW - procrastination KW - chatbot KW - generative model KW - semigenerative model KW - time management KW - cognitive behavioral therapy KW - psychological assessment KW - intervention engagement KW - emotional support KW - user experience KW - mobile intervention KW - artificial intelligence KW - AI N2 - Background: Procrastination negatively affects university students? academics and mental health. Traditional time management apps lack therapeutic strategies like cognitive behavioral therapy to address procrastination?s psychological aspects. Therefore, we developed and integrated a semigenerative chatbot named Moa into a to-do app. Objective: We intended to determine the benefits of the Moa-integrated to-do app over the app without Moa by verifying behavioral and cognitive changes, analyzing the influence of engagement patterns on the changes, and exploring the user experience. Methods: The developed chatbot Moa guided users over 30 days in terms of self-observation, strategy establishment, and reflection. The architecture comprised response-generating and procrastination factor?detection algorithms. A pilot randomized controlled trial was conducted with 85 participants (n=37, 44% female; n=48, 56% male) from a university in South Korea. The control group used a to-do app without Moa, whereas the treatment group used a fully automated Moa-integrated app. The Irrational Procrastination Scale, Pure Procrastination Scale, Time Management Behavior Scale, and the Perceived Stress Scale were examined using linear mixed models with repeated measurements obtained before (T0) and after (T1) 1-month use and after 2-month use (T2) to assess the changes in irrational procrastination, pure procrastination, time management and behavior, academic self-regulation, and stress. Intervention engagement, divided into ?high,? ?middle? and ?low? clusters, was quantified using app access and use of the to-do list and grouped using k-means clustering. In addition, changes in the psychological scale scores between the control and treatment groups were analyzed within each cluster. User experience was quantified based on the usability, feasibility, and acceptability of and satisfaction with the app, whereas thematic analysis explored the users? subjective responses to app use. Results: In total, 75 participants completed the study. The interaction of time × procrastination was significant during the required use period (P=.01). The post hoc test indicated a significant improvement from T0 to T1 in the Time Management Behavior Scale and Perceived Stress Scale scores only in the treatment group (P<.001 and P=.009). The changes in Pure Procrastination Scale score after the required use period were significant in all clusters except for the low cluster of the control group. The high cluster in the treatment group exhibited a significant change in the Irrational Procrastination Scale after Bonferroni correction (P=.046). Usability was determined to be good in the treatment group (mean score 72.8, SD 16.0), and acceptability was higher than in the control group (P=.03). Evaluation of user experience indicated that only the participants in the treatment group achieved self-reflection and experienced an alliance with the app. Conclusions: The chatbot-integrated app demonstrated greater efficacy in influencing user behavior providing psychological support. It will serve as a valuable tool for managing procrastination and stress together. Trial Registration: Clinical Research Information Service (CRIS) KCT0009056; https://tinyurl.com/yc84tedk UR - https://mhealth.jmir.org/2025/1/e53133 UR - http://dx.doi.org/10.2196/53133 UR - http://www.ncbi.nlm.nih.gov/pubmed/40208664 ID - info:doi/10.2196/53133 ER - TY - JOUR AU - McAlister, Kelsey AU - Baez, Lara AU - Huberty, Jennifer AU - Kerppola, Marianna PY - 2025/4/8 TI - Chatbot to Support the Mental Health Needs of Pregnant and Postpartum Women (Moment for Parents): Design and Pilot Study JO - JMIR Form Res SP - e72469 VL - 9 KW - perinatal support KW - human-centered design KW - digital health KW - maternal health KW - chatbot KW - digital tool N2 - Background: Maternal mental health disorders are prevalent, yet many individuals do not receive adequate support due to stigma, financial constraints, and limited access to care. Digital interventions, particularly chatbots, have the potential to provide scalable, low-cost support, but few are tailored specifically to the needs of perinatal individuals. Objective: This study aimed to (1) design and develop Moment for Parents, a tailored chatbot for perinatal mental health education and support, and (2) assess usability through engagement, usage patterns, and user experience. Methods: This study used a human-centered design to develop Moment for Parents, a rules-based chatbot to support pregnant and postpartum individuals. In phase 1, ethnographic interviews (n=43) explored user needs to inform chatbot development. In phase 2, a total of 108 pregnant and postpartum individuals were recruited to participate in a pilot test and had unrestricted access to the chatbot. Engagement was tracked over 8 months to assess usage patterns and re-engagement rates. After 1 month, participants completed a usability, relevance, and satisfaction survey, providing key insights for refining the chatbot. Results: Key themes that came from the ethnographic interviews in phase 1 included the need for trusted resources, emotional support, and better mental health guidance. These insights informed chatbot content, including mood-based exercises and coping strategies. Re-engagement was high (69/108, 63.9%), meaning users who had stopped interacting for at least 1 week returned to the chatbot at least once. A large proportion (28/69, 40.6%) re-engaged 3 or more times. Overall, 28/30 (93.3%) found the chatbot relevant for them, though some noted repetitive content and limited response options. Conclusions: The Moment for Parents chatbot successfully engaged pregnant and postpartum individuals with higher-than-typical retention and re-engagement patterns. The findings underscore the importance of flexible, mood-based digital support tailored to perinatal needs. Future research should examine how intermittent chatbot use influences mental health outcomes and refine content delivery to enhance long-term engagement and effectiveness. UR - https://formative.jmir.org/2025/1/e72469 UR - http://dx.doi.org/10.2196/72469 ID - info:doi/10.2196/72469 ER - TY - JOUR AU - Kim, Minjin AU - Kim, Ellie AU - Lee, Hyeongsuk AU - Piao, Meihua AU - Rosen, Brittany AU - Allison, J. Jeroan AU - Zai, H. Adrian AU - Nguyen, L. Hoa AU - Shin, Dong-Soo AU - Kahn, A. Jessica PY - 2025/4/7 TI - A Culturally Tailored Artificial Intelligence Chatbot (K-Bot) to Promote Human Papillomavirus Vaccination Among Korean Americans: Development and Usability Study JO - Asian Pac Isl Nurs J SP - e71865 VL - 9 KW - human papillomavirus KW - HPV vaccination KW - artificial intelligence KW - AI KW - chatbot intervention KW - Korean Americans KW - usability testing KW - culturally tailored intervention N2 - Background: Human papillomavirus (HPV) is the most common sexually transmitted infection (STI) worldwide and is associated with various cancers, including cervical and oropharyngeal cancers. Despite the availability of effective vaccines, significant disparities in HPV vaccination rates persist, particularly among racial and ethnic minorities, such as Korean Americans. Cultural stigma, language barriers, and limited access to tailored health information contribute to these disparities. Objective: This study aimed to develop and evaluate the usability of K-Bot, an artificial intelligence (AI)?powered, culturally tailored, bilingual (Korean and English) chatbot designed to provide culturally sensitive health information about HPV vaccination to Korean immigrants and Korean Americans. Methods: K-Bot was developed using CloudTuring and Google Dialogflow. Its dialogues were created using Centers for Disease Control and Prevention (CDC) evidence-based HPV information and tailored to the Korean American population based on findings from previous studies. The evaluation and refinement process for K-Bot was organized into 3 phases: (1) expert evaluation by a multidisciplinary panel, (2) usability testing, and (3) iterative refinement based on feedback. An online survey collected demographics, HPV awareness, and vaccination status before 6 focus groups (N=21) sessions using semistructured questions guided by Peter Morville?s usability framework. Quantitative data were analyzed descriptively, and thematic analysis assessed usability, cultural relevance, and content clarity across 6 dimensions: desirability, accessibility, findability, credibility, usability, and usefulness. Results: Participants had a mean age of 23.7 (SD 4.7) years, with most being female (n=12, 57.1%), second-generation individuals (n=13, 61.9%), and single (n=20, 95.2%). HPV awareness was high (n=19, 90.5%), vaccine knowledge was also high (n=18, 81.8%), but only 11 (52.4%) participants were vaccinated. Feedback-driven refinements addressed usability challenges, including simplifying navigation and adding visual elements. Participants described K-Bot as a promising tool for promoting HPV vaccination among Korean and Korean American users, citing its bilingual functionality and culturally tailored content as key strengths. Evidence-based information was valued, but participants recommended visuals to improve engagement and reduce cognitive load. Accessibility concerns included broken links, and participants proposed enhancements, such as animations, demographic-specific resources, and interactive features, to improve usability and engagement further. Conclusions: Usability testing of K-Bot revealed its potential as a culturally tailored, bilingual tool for promoting HPV vaccination among Korean immigrants and Korean Americans. Participants valued its evidence-based information, cultural relevance, and bilingual functionality but recommended improvements, such as enhanced navigation, visual elements, and interactive features, to boost engagement and usability. These findings support the potential of AI-driven tools to improve health care access by addressing key barriers to care. Further research is needed to evaluate their broader impact and optimize their design and implementation for individuals with diverse health care needs. UR - https://apinj.jmir.org/2025/1/e71865 UR - http://dx.doi.org/10.2196/71865 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/71865 ER - TY - JOUR AU - Cook, A. David AU - Overgaard, Joshua AU - Pankratz, Shane V. AU - Del Fiol, Guilherme AU - Aakre, A. Chris PY - 2025/4/4 TI - Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback JO - J Med Internet Res SP - e68486 VL - 27 KW - simulation training KW - natural language processing KW - computer-assisted instruction KW - clinical decision-making KW - clinical reasoning KW - machine learning KW - virtual patient KW - natural language generation N2 - Background: Virtual patients (VPs) are computer screen?based simulations of patient-clinician encounters. VP use is limited by cost and low scalability. Objective: We aimed to show that VPs powered by large language models (LLMs) can generate authentic dialogues, accurately represent patient preferences, and provide personalized feedback on clinical performance. We also explored using LLMs to rate the quality of dialogues and feedback. Methods: We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI?s generative pretrained transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough diagnosis and diabetes management), each with permutations representing different patient preferences, we created 60 conversations (dialogues plus feedback): 48 with a human clinician and 12 ?self-chat? dialogues with GPT role-playing both the VP and clinician. Primary outcomes were dialogue authenticity and feedback quality, rated using novel instruments for which we conducted a validation study collecting evidence of content, internal structure (reproducibility), relations with other variables, and response process. Each conversation was rated by 3 physicians and by GPT. Secondary outcomes included user experience, bias, patient preferences represented in the dialogues, and conversation features that influenced authenticity. Results: The average cost per conversation was US $0.51 for GPT-4.0-Turbo and US $0.02 for GPT-3.5-Turbo. Mean (SD) conversation ratings, maximum 6, were overall dialogue authenticity 4.7 (0.7), overall user experience 4.9 (0.7), and average feedback quality 4.7 (0.6). For dialogues created using GPT-4.0-Turbo, physician ratings of patient preferences aligned with intended preferences in 20 to 47 of 48 dialogues (42%-98%). Subgroup comparisons revealed higher ratings for dialogues using GPT-4.0-Turbo versus GPT-3.5-Turbo and for human-generated versus self-chat dialogues. Feedback ratings were similar for human-generated versus GPT-generated ratings, whereas authenticity ratings were lower. We did not perceive bias in any conversation. Dialogue features that detracted from authenticity included that GPT was verbose or used atypical vocabulary (93/180, 51.7% of conversations), was overly agreeable (n=56, 31%), repeated the question as part of the response (n=47, 26%), was easily convinced by clinician suggestions (n=35, 19%), or was not disaffected by poor clinician performance (n=32, 18%). For feedback, detractors included excessively positive feedback (n=42, 23%), failure to mention important weaknesses or strengths (n=41, 23%), or factual inaccuracies (n=39, 22%). Regarding validation of dialogue and feedback scores, items were meticulously developed (content evidence), and we confirmed expected relations with other variables (higher ratings for advanced LLMs and human-generated dialogues). Reproducibility was suboptimal, due largely to variation in LLM performance rather than rater idiosyncrasies. Conclusions: LLM-powered VPs can simulate patient-clinician dialogues, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings. UR - https://www.jmir.org/2025/1/e68486 UR - http://dx.doi.org/10.2196/68486 UR - http://www.ncbi.nlm.nih.gov/pubmed/39854611 ID - info:doi/10.2196/68486 ER - TY - JOUR AU - Zhang, Manlin AU - Zhao, Tianyu PY - 2025/4/2 TI - Citation Accuracy Challenges Posed by Large Language Models JO - JMIR Med Educ SP - e72998 VL - 11 KW - chatGPT KW - medical education KW - Saudi Arabia KW - perceptions KW - knowledge KW - medical students KW - faculty KW - chatbot KW - qualitative study KW - artificial intelligence KW - AI KW - AI-based tools KW - universities KW - thematic analysis KW - learning KW - satisfaction KW - LLM KW - large language model UR - https://mededu.jmir.org/2025/1/e72998 UR - http://dx.doi.org/10.2196/72998 ID - info:doi/10.2196/72998 ER - TY - JOUR AU - Temsah, Mohamad-Hani AU - Al-Eyadhy, Ayman AU - Jamal, Amr AU - Alhasan, Khalid AU - Malki, H. Khalid PY - 2025/4/2 TI - Authors? Reply: Citation Accuracy Challenges Posed by Large Language Models JO - JMIR Med Educ SP - e73698 VL - 11 KW - ChatGPT KW - Gemini KW - DeepSeek KW - medical education KW - AI KW - artificial intelligence KW - Saudi Arabia KW - perceptions KW - medical students KW - faculty KW - LLM KW - chatbot KW - qualitative study KW - thematic analysis KW - satisfaction KW - RAG retrieval-augmented generation UR - https://mededu.jmir.org/2025/1/e73698 UR - http://dx.doi.org/10.2196/73698 ID - info:doi/10.2196/73698 ER - TY - JOUR AU - Zisquit, ?Moreah AU - Shoa, Alon AU - Oliva, Ramon AU - Perry, Stav AU - Spanlang, Bernhard AU - Brunstein Klomek, Anat AU - Slater, Mel AU - Friedman, Doron PY - 2025/4/2 TI - AI-Enhanced Virtual Reality Self-Talk for Psychological Counseling: Formative Qualitative Study JO - JMIR Form Res SP - e67782 VL - 9 KW - virtual human KW - large language model KW - virtual reality KW - self-talk KW - psychotherapy KW - artificial intelligence KW - AI N2 - Background: Access to mental health services continues to pose a global challenge, with current services often unable to meet the growing demand. This has sparked interest in conversational artificial intelligence (AI) agents as potential solutions. Despite this, the development of a reliable virtual therapist remains challenging, and the feasibility of AI fulfilling this sensitive role is still uncertain. One promising approach involves using AI agents for psychological self-talk, particularly within virtual reality (VR) environments. Self-talk in VR allows externalizing self-conversation by enabling individuals to embody avatars representing themselves as both patient and counselor, thus enhancing cognitive flexibility and problem-solving abilities. However, participants sometimes experience difficulties progressing in sessions, which is where AI could offer guidance and support. Objective: This formative study aims to assess the challenges and advantages of integrating an AI agent into self-talk in VR for psychological counseling, focusing on user experience and the potential role of AI in supporting self-reflection, problem-solving, and positive behavioral change. Methods: We carried out an iterative design and development of a system and protocol integrating large language models (LLMs) within VR self-talk during the first two and a half years. The design process addressed user interface, speech-to-text functionalities, fine-tuning the LLMs, and prompt engineering. Upon completion of the design process, we conducted a 3-month long exploratory qualitative study in which 11 healthy participants completed a session that included identifying a problem they wanted to address, attempting to address this problem using self-talk in VR, and then continuing self-talk in VR but this time with the assistance of an LLM-based virtual human. The sessions were carried out with a trained clinical psychologist and followed by semistructured interviews. We used applied thematic analysis after the interviews to code and develop key themes for the participants that addressed our research objective. Results: In total, 4 themes were identified regarding the quality of advice, the potential advantages of human-AI collaboration in self-help, the believability of the virtual human, and user preferences for avatars in the scenario. The participants rated their desire to engage in additional such sessions at 8.3 out of 10, and more than half of the respondents indicated that they preferred using VR self-talk with AI rather than without it. On average, the usefulness of the session was rated 6.9 (SD 0.54), and the degree to which it helped solve their problem was rated 6.1 (SD 1.58). Participants specifically noted that human-AI collaboration led to improved outcomes and facilitated more positive thought processes, thereby enhancing self-reflection and problem-solving abilities. Conclusions: This exploratory study suggests that the VR self-talk paradigm can be enhanced by LLM-based agents and presents the ways to achieve this, potential pitfalls, and additional insights. UR - https://formative.jmir.org/2025/1/e67782 UR - http://dx.doi.org/10.2196/67782 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67782 ER - TY - JOUR AU - Juels, Parker PY - 2025/4/1 TI - The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary JO - JMIR Dermatol SP - e71768 VL - 8 KW - artificial intelligence KW - ChatGPT KW - atopic dermatitis KW - acne vulgaris KW - actinic keratosis KW - rosacea KW - AI KW - diagnosis KW - treatment KW - prognosis KW - dermatological diagnoses KW - chatbots KW - patients KW - dermatologist UR - https://derma.jmir.org/2025/1/e71768 UR - http://dx.doi.org/10.2196/71768 ID - info:doi/10.2196/71768 ER - TY - JOUR AU - Chau, Courtney AU - Feng, Hao AU - Cobos, Gabriela AU - Park, Joyce PY - 2025/4/1 TI - Authors? Reply: The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary JO - JMIR Dermatol SP - e72540 VL - 8 KW - artificial intelligence KW - ChatGPT KW - atopic dermatitis KW - acne vulgaris KW - actinic keratosis KW - rosacea KW - AI KW - diagnosis KW - treatment KW - prognosis KW - dermatological diagnoses KW - chatbots KW - patients KW - dermatologist UR - https://derma.jmir.org/2025/1/e72540 UR - http://dx.doi.org/10.2196/72540 ID - info:doi/10.2196/72540 ER - TY - JOUR AU - Chen, David AU - Alnassar, Addeen Saif AU - Avison, Elizabeth Kate AU - Huang, S. Ryan AU - Raman, Srinivas PY - 2025/3/28 TI - Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review JO - JMIR Cancer SP - e65984 VL - 11 KW - artificial intelligence KW - chatbot KW - data extraction KW - AI KW - conversational agent KW - health information KW - oncology KW - scoping review KW - natural language processing KW - NLP KW - large language model KW - LLM KW - digital health KW - health technology KW - electronic health record N2 - Background: Natural language processing systems for data extraction from unstructured clinical text require expert-driven input for labeled annotations and model training. The natural language processing competency of large language models (LLM) can enable automated data extraction of important patient characteristics from electronic health records, which is useful for accelerating cancer clinical research and informing oncology care. Objective: This scoping review aims to map the current landscape, including definitions, frameworks, and future directions of LLMs applied to data extraction from clinical text in oncology. Methods: We queried Ovid MEDLINE for primary, peer-reviewed research studies published since 2000 on June 2, 2024, using oncology- and LLM-related keywords. This scoping review included studies that evaluated the performance of an LLM applied to data extraction from clinical text in oncology contexts. Study attributes and main outcomes were extracted to outline key trends of research in LLM-based data extraction. Results: The literature search yielded 24 studies for inclusion. The majority of studies assessed original and fine-tuned variants of the BERT LLM (n=18, 75%) followed by the Chat-GPT conversational LLM (n=6, 25%). LLMs for data extraction were commonly applied in pan-cancer clinical settings (n=11, 46%), followed by breast (n=4, 17%), and lung (n=4, 17%) cancer contexts, and were evaluated using multi-institution datasets (n=18, 75%). Comparing the studies published in 2022?2024 versus 2019?2021, both the total number of studies (18 vs 6) and the proportion of studies using prompt engineering increased (5/18, 28% vs 0/6, 0%), while the proportion using fine-tuning decreased (8/18, 44.4% vs 6/6, 100%). Advantages of LLMs included positive data extraction performance and reduced manual workload. Conclusions: LLMs applied to data extraction in oncology can serve as useful automated tools to reduce the administrative burden of reviewing patient health records and increase time for patient-facing care. Recent advances in prompt-engineering and fine-tuning methods, and multimodal data extraction present promising directions for future research. Further studies are needed to evaluate the performance of LLM-enabled data extraction in clinical domains beyond the training dataset and to assess the scope and integration of LLMs into real-world clinical environments. UR - https://cancer.jmir.org/2025/1/e65984 UR - http://dx.doi.org/10.2196/65984 ID - info:doi/10.2196/65984 ER - TY - JOUR AU - Roshani, Amin Mohammad AU - Zhou, Xiangyu AU - Qiang, Yao AU - Suresh, Srinivasan AU - Hicks, Steven AU - Sethuraman, Usha AU - Zhu, Dongxiao PY - 2025/3/27 TI - Generative Large Language Model?Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19 JO - JMIR AI SP - e67363 VL - 4 KW - personalized risk assessment KW - large language model KW - conversational AI KW - artificial intelligence KW - COVID-19 N2 - Background: Large language models (LLMs) have demonstrated powerful capabilities in natural language tasks and are increasingly being integrated into health care for tasks like disease risk assessment. Traditional machine learning methods rely on structured data and coding, limiting their flexibility in dynamic clinical environments. This study presents a novel approach to disease risk assessment using generative LLMs through conversational artificial intelligence (AI), eliminating the need for programming. Objective: This study evaluates the use of pretrained generative LLMs, including LLaMA2-7b and Flan-T5-xl, for COVID-19 severity prediction with the goal of enabling a real-time, no-code, risk assessment solution through chatbot-based, question-answering interactions. To contextualize their performance, we compare LLMs with traditional machine learning classifiers, such as logistic regression, extreme gradient boosting (XGBoost), and random forest, which rely on tabular data. Methods: We fine-tuned LLMs using few-shot natural language examples from a dataset of 393 pediatric patients, developing a mobile app that integrates these models to provide real-time, no-code, COVID-19 severity risk assessment through clinician-patient interaction. The LLMs were compared with traditional classifiers across different experimental settings, using the area under the curve (AUC) as the primary evaluation metric. Feature importance derived from LLM attention layers was also analyzed to enhance interpretability. Results: Generative LLMs demonstrated strong performance in low-data settings. In zero-shot scenarios, the T0-3b-T model achieved an AUC of 0.75, while other LLMs, such as T0pp(8bit)-T and Flan-T5-xl-T, reached 0.67 and 0.69, respectively. At 2-shot settings, logistic regression and random forest achieved an AUC of 0.57, while Flan-T5-xl-T and T0-3b-T obtained 0.69 and 0.65, respectively. By 32-shot settings, Flan-T5-xl-T reached 0.70, similar to logistic regression (0.69) and random forest (0.68), while XGBoost improved to 0.65. These results illustrate the differences in how generative LLMs and traditional models handle the increasing data availability. LLMs perform well in low-data scenarios, whereas traditional models rely more on structured tabular data and labeled training examples. Furthermore, the mobile app provides real-time, COVID-19 severity assessments and personalized insights through attention-based feature importance, adding value to the clinical interpretation of the results. Conclusions: Generative LLMs provide a robust alternative to traditional classifiers, particularly in scenarios with limited labeled data. Their ability to handle unstructured inputs and deliver personalized, real-time assessments without coding makes them highly adaptable to clinical settings. This study underscores the potential of LLM-powered conversational artificial intelligence (AI) in health care and encourages further exploration of its use for real-time, disease risk assessment and decision-making support. UR - https://ai.jmir.org/2025/1/e67363 UR - http://dx.doi.org/10.2196/67363 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67363 ER - TY - JOUR AU - Immel, Diana AU - Hilpert, Bernhard AU - Schwarz, Patricia AU - Hein, Andreas AU - Gebhard, Patrick AU - Barton, Simon AU - Hurlemann, René PY - 2025/3/26 TI - Patients? and Health Care Professionals? Expectations of Virtual Therapeutic Agents in Outpatient Aftercare: Qualitative Survey Study JO - JMIR Form Res SP - e59527 VL - 9 KW - socially interactive agent KW - e-mental health KW - mental illness KW - mental disorder KW - depression KW - major depressive disorder KW - suicide prevention KW - suicidal ideation KW - outpatient aftercare KW - artificial intelligence KW - virtual therapeutic assistant KW - public health KW - digital technology KW - digital intervention KW - digital health care N2 - Background: Depression is a serious mental health condition that can have a profound impact on the individual experiencing the disorder and those providing care. While psychotherapy and medication can be effective, there are gaps in current approaches, particularly in outpatient care. This phase is often associated with a high risk of relapse and readmission, and patients often report a lack of support. Socially interactive agents represent an innovative approach to the provision of assistance. Often powered by artificial intelligence, these virtual agents can interact socially and elicit humanlike emotions. In health care, they are used as virtual therapeutic assistants to fill gaps in outpatient aftercare. Objective: We aimed to explore the expectations of patients with depression and health care professionals by conducting a qualitative survey. Our analysis focused on research questions related to the appearance and role of the assistant, the assistant-patient interaction (time of interaction, skills and abilities of the assistant, and modes of interaction) and the therapist-assistant interaction. Methods: A 2-part qualitative study was conducted to explore the perspectives of the 2 groups (patients and care providers). In the first step, care providers (n=30) were recruited during a regional offline meeting. After a short presentation, they were given a link and were asked to complete a semistructured web-based questionnaire. Next, patients (n=20) were recruited from a clinic and were interviewed in a semistructured face-to-face interview. Results: The survey findings suggested that the assistant should be a multimodal communicator (voice, facial expressions, and gestures) and counteract negative self-evaluation. Most participants preferred a female assistant or wanted the option to choose the gender. In total, 24 (80%) health care professionals wanted a selectable option, while patients exhibited a marked preference for a female or diverse assistant. Regrading patient-assistant interaction, the assistant was seen as a proactive recipient of information, and the patient as a passive one. Gaps in aftercare could be filled by the unlimited availability of the assistant. However, patients should retain their autonomy to avoid dependency. The monitoring of health status was viewed positively by both groups. A biofeedback function was desired to detect early warning signs of disease. When appropriate to the situation, a sense of humor in the assistant was desirable. The desired skills of the assistant can be summarized as providing structure and emotional support, especially warmth and competence to build trust. Consistency was important for the caregiver to appear authentic. Regarding the assistant?care provider interaction, 3 key areas were identified: objective patient status measurement, emergency suicide prevention, and an information tool and decision support system for health care professionals. Conclusions: Overall, the survey conducted provides innovative guidelines for the development of virtual therapeutic assistants to fill the gaps in patient aftercare. UR - https://formative.jmir.org/2025/1/e59527 UR - http://dx.doi.org/10.2196/59527 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59527 ER - TY - JOUR AU - Waaler, Niklas Per AU - Hussain, Musarrat AU - Molchanov, Igor AU - Bongo, Ailo Lars AU - Elvevåg, Brita PY - 2025/3/26 TI - Prompt Engineering an Informational Chatbot for Education on Mental Health Using a Multiagent Approach for Enhanced Compliance With Prompt Instructions: Algorithm Development and Validation JO - JMIR AI SP - e69820 VL - 4 KW - schizophrenia KW - mental health KW - prompt engineering KW - AI in health care KW - AI safety KW - self-reflection KW - limiting scope of AI KW - large language model KW - LLM KW - GPT-4 KW - AI transparency KW - adaptive learning N2 - Background: People with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition. Education platforms powered by large language models (LLMs) have the potential to improve the accessibility of mental health information. However, the black-box nature of LLMs raises ethical and safety concerns regarding the controllability of chatbots. In particular, prompt-engineered chatbots may drift from their intended role as the conversation progresses and become more prone to hallucinations. Objective: This study aimed to develop and evaluate a critical analysis filter (CAF) system that ensures that an LLM-powered prompt-engineered chatbot reliably complies with its predefined instructions and scope while delivering validated mental health information. Methods: For a proof of concept, we prompt engineered an educational chatbot for schizophrenia powered by GPT-4 that could dynamically access information from a schizophrenia manual written for people with schizophrenia and their caregivers. In the CAF, a team of prompt-engineered LLM agents was used to critically analyze and refine the chatbot?s responses and deliver real-time feedback to the chatbot. To assess the ability of the CAF to re-establish the chatbot?s adherence to its instructions, we generated 3 conversations (by conversing with the chatbot with the CAF disabled) wherein the chatbot started to drift from its instructions toward various unintended roles. We used these checkpoint conversations to initialize automated conversations between the chatbot and adversarial chatbots designed to entice it toward unintended roles. Conversations were repeatedly sampled with the CAF enabled and disabled. In total, 3 human raters independently rated each chatbot response according to criteria developed to measure the chatbot?s integrity, specifically, its transparency (such as admitting when a statement lacked explicit support from its scripted sources) and its tendency to faithfully convey the scripted information in the schizophrenia manual. Results: In total, 36 responses (3 different checkpoint conversations, 3 conversations per checkpoint, and 4 adversarial queries per conversation) were rated for compliance with the CAF enabled and disabled. Activating the CAF resulted in a compliance score that was considered acceptable (?2) in 81% (7/36) of the responses, compared to only 8.3% (3/36) when the CAF was deactivated. Conclusions: Although more rigorous testing in realistic scenarios is needed, our results suggest that self-reflection mechanisms could enable LLMs to be used effectively and safely in educational mental health platforms. This approach harnesses the flexibility of LLMs while reliably constraining their scope to appropriate and accurate interactions. UR - https://ai.jmir.org/2025/1/e69820 UR - http://dx.doi.org/10.2196/69820 UR - http://www.ncbi.nlm.nih.gov/pubmed/39992720 ID - info:doi/10.2196/69820 ER - TY - JOUR AU - Zhu, Jiading AU - Dong, Alec AU - Wang, Cindy AU - Veldhuizen, Scott AU - Abdelwahab, Mohamed AU - Brown, Andrew AU - Selby, Peter AU - Rose, Jonathan PY - 2025/3/21 TI - The Impact of ChatGPT Exposure on User Interactions With a Motivational Interviewing Chatbot: Quasi-Experimental Study JO - JMIR Form Res SP - e56973 VL - 9 KW - chatbot KW - digital health KW - motivational interviewing KW - natural language processing KW - ChatGPT KW - large language models KW - artificial intelligence KW - experimental KW - smoking cessation KW - conversational agent N2 - Background: The worldwide introduction of ChatGPT in November 2022 may have changed how its users perceive and interact with other chatbots. This possibility may confound the comparison of responses to pre-ChatGPT and post-ChatGPT iterations of pre-existing chatbots, in turn affecting the direction of their evolution. Before the release of ChatGPT, we created a therapeutic chatbot, MIBot, whose goal is to use motivational interviewing to guide smokers toward making the decision to quit smoking. We were concerned that measurements going forward would not be comparable to those in the past, impacting the evaluation of future changes to the chatbot. Objective: The aim of the study is to explore changes in how users interact with MIBot after the release of ChatGPT and examine the relationship between these changes and users? familiarity with ChatGPT. Methods: We compared user interactions with MIBot prior to ChatGPT?s release and 6 months after the release. Participants (N=143) were recruited through a web-based platform in November of 2022, prior to the release of ChatGPT, to converse with MIBot, in an experiment we refer to as MIBot (version 5.2). In May 2023, a set of (n=129) different participants were recruited to interact with the same version of MIBot and asked additional questions about their familiarity with ChatGPT, in the experiment called MIBot (version 5.2A). We used the Mann-Whitney U test to compare metrics between cohorts and Spearman rank correlation to assess relationships between familiarity with ChatGPT and other metrics within the MIBot (version 5.2A) cohort. Results: In total, 83(64.3%) participants in the MIBot (version 5.2A) cohort had used ChatGPT, with 66 (51.2%) using it on a regular basis. Satisfaction with MIBot was significantly lower in the post-ChatGPT cohort (U=11,331.0; P=.001), driven by a decrease in perceived empathy as measured by the Average Consultation and Relational Empathy Measure (U=10,838.0; P=.01). Familiarity with ChatGPT was positively correlated with average response length (?=0.181; P=.04) and change in perceived importance of quitting smoking (?=0.296; P<.001). Conclusions: The widespread reach of ChatGPT has changed how users interact with MIBot. Post-ChatGPT users are less satisfied with MIBot overall, particularly in terms of perceived empathy. However, users with greater familiarity with ChatGPT provide longer responses and demonstrated a greater increase in their perceived importance of quitting smoking after a session with MIBot. These findings suggest the need for chatbot developers to adapt to evolving user expectations in the era of advanced generative artificial intelligence. UR - https://formative.jmir.org/2025/1/e56973 UR - http://dx.doi.org/10.2196/56973 ID - info:doi/10.2196/56973 ER - TY - JOUR AU - Sattler, S. Samantha AU - Chetla, Nitin AU - Chen, Matthew AU - Hage, Rajai Tamer AU - Chang, Joseph AU - Guo, Young William AU - Hugh, Jeremy PY - 2025/3/21 TI - Evaluating the Diagnostic Accuracy of ChatGPT-4 Omni and ChatGPT-4 Turbo in Identifying Melanoma: Comparative Study JO - JMIR Dermatol SP - e67551 VL - 8 KW - melanoma KW - skin cancer KW - chatGPT KW - chat-GPT KW - chatbot KW - dermatology KW - cancer KW - oncology KW - metastases KW - diagnostic KW - diagnosis KW - lesion KW - efficacy KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - algorithm KW - model KW - analytics UR - https://derma.jmir.org/2025/1/e67551 UR - http://dx.doi.org/10.2196/67551 ID - info:doi/10.2196/67551 ER - TY - JOUR AU - Andalib, Saman AU - Spina, Aidin AU - Picton, Bryce AU - Solomon, S. Sean AU - Scolaro, A. John AU - Nelson, M. Ariana PY - 2025/3/21 TI - Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study JO - JMIR AI SP - e70222 VL - 4 KW - large language models KW - LLM KW - patient education KW - translation KW - bilingual evaluation understudy KW - GPT-4 KW - Google Translate N2 - Background: Language barriers contribute significantly to health care disparities in the United States, where a sizable proportion of patients are exclusively Spanish speakers. In orthopedic surgery, such barriers impact both patients? comprehension of and patients? engagement with available resources. Studies have explored the utility of large language models (LLMs) for medical translation but have yet to robustly evaluate artificial intelligence (AI)?driven translation and simplification of orthopedic materials for Spanish speakers. Objective: This study used the bilingual evaluation understudy (BLEU) method to assess translation quality and investigated the ability of AI to simplify patient education materials (PEMs) in Spanish. Methods: PEMs (n=78) from the American Academy of Orthopaedic Surgery were translated from English to Spanish, using 2 LLMs (GPT-4 and Google Translate). The BLEU methodology was applied to compare AI translations with professionally human-translated PEMs. The Friedman test and Dunn multiple comparisons test were used to statistically quantify differences in translation quality. A readability analysis and feature analysis were subsequently performed to evaluate text simplification success and the impact of English text features on BLEU scores. The capability of an LLM to simplify medical language written in Spanish was also assessed. Results: As measured by BLEU scores, GPT-4 showed moderate success in translating PEMs into Spanish but was less successful than Google Translate. Simplified PEMs demonstrated improved readability when compared to original versions (P<.001) but were unable to reach the targeted grade level for simplification. The feature analysis revealed that the total number of syllables and average number of syllables per sentence had the highest impact on BLEU scores. GPT-4 was able to significantly reduce the complexity of medical text written in Spanish (P<.001). Conclusions: Although Google Translate outperformed GPT-4 in translation accuracy, LLMs, such as GPT-4, may provide significant utility in translating medical texts into Spanish and simplifying such texts. We recommend considering a dual approach?using Google Translate for translation and GPT-4 for simplification?to improve medical information accessibility and orthopedic surgery education among Spanish-speaking patients. UR - https://ai.jmir.org/2025/1/e70222 UR - http://dx.doi.org/10.2196/70222 ID - info:doi/10.2196/70222 ER - TY - JOUR AU - Baek, Gumhee AU - Cha, Chiyoung AU - Han, Jin-Hui PY - 2025/3/19 TI - AI Chatbots for Psychological Health for Health Professionals: Scoping Review JO - JMIR Hum Factors SP - e67682 VL - 12 KW - artificial intelligence KW - AI chatbot KW - psychological health KW - health professionals KW - burnout KW - scoping review N2 - Background: Health professionals face significant psychological burdens including burnout, anxiety, and depression. These can negatively impact their well-being and patient care. Traditional psychological health interventions often encounter limitations such as a lack of accessibility and privacy. Artificial intelligence (AI) chatbots are being explored as potential solutions to these challenges, offering available and immediate support. Therefore, it is necessary to systematically evaluate the characteristics and effectiveness of AI chatbots designed specifically for health professionals. Objective: This scoping review aims to evaluate the existing literature on the use of AI chatbots for psychological health support among health professionals. Methods: Following Arksey and O?Malley?s framework, a comprehensive literature search was conducted across eight databases, covering studies published before 2024, including backward and forward citation tracking and manual searching from the included studies. Studies were screened for relevance based on inclusion and exclusion criteria, among 2465 studies retrieved, 10 studies met the criteria for review. Results: Among the 10 studies, six chatbots were delivered via mobile platforms, and four via web-based platforms, all enabling one-on-one interactions. Natural language processing algorithms were used in six studies and cognitive behavioral therapy techniques were applied to psychological health in four studies. Usability was evaluated in six studies through participant feedback and engagement metrics. Improvements in anxiety, depression, and burnout were observed in four studies, although one reported an increase in depressive symptoms. Conclusions: AI chatbots show potential tools to support the psychological health of health professionals by offering personalized and accessible interventions. Nonetheless, further research is required to establish standardized protocols and validate the effectiveness of these interventions. Future studies should focus on refining chatbot designs and assessing their impact on diverse health professionals. UR - https://humanfactors.jmir.org/2025/1/e67682 UR - http://dx.doi.org/10.2196/67682 ID - info:doi/10.2196/67682 ER - TY - JOUR AU - Montagna, Marco AU - Chiabrando, Filippo AU - De Lorenzo, Rebecca AU - Rovere Querini, Patrizia AU - PY - 2025/3/18 TI - Impact of Clinical Decision Support Systems on Medical Students? Case-Solving Performance: Comparison Study with a Focus Group JO - JMIR Med Educ SP - e55709 VL - 11 KW - chatGPT KW - chatbot KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - algorithm KW - predictive model KW - predictive analytics KW - predictive system KW - practical model KW - deep learning KW - large language models KW - LLMs KW - medical education KW - medical teaching KW - teaching environment KW - clinical decision support systems KW - CDSS KW - decision support KW - decision support tool KW - clinical decision-making KW - innovative teaching N2 - Background: Health care practitioners use clinical decision support systems (CDSS) as an aid in the crucial task of clinical reasoning and decision-making. Traditional CDSS are online repositories (ORs) and clinical practice guidelines (CPG). Recently, large language models (LLMs) such as ChatGPT have emerged as potential alternatives. They have proven to be powerful, innovative tools, yet they are not devoid of worrisome risks. Objective: This study aims to explore how medical students perform in an evaluated clinical case through the use of different CDSS tools. Methods: The authors randomly divided medical students into 3 groups, CPG, n=6 (38%); OR, n=5 (31%); and ChatGPT, n=5 (31%); and assigned each group a different type of CDSS for guidance in answering prespecified questions, assessing how students? speed and ability at resolving the same clinical case varied accordingly. External reviewers evaluated all answers based on accuracy and completeness metrics (score: 1?5). The authors analyzed and categorized group scores according to the skill investigated: differential diagnosis, diagnostic workup, and clinical decision-making. Results: Answering time showed a trend for the ChatGPT group to be the fastest. The mean scores for completeness were as follows: CPG 4.0, OR 3.7, and ChatGPT 3.8 (P=.49). The mean scores for accuracy were as follows: CPG 4.0, OR 3.3, and ChatGPT 3.7 (P=.02). Aggregating scores according to the 3 students? skill domains, trends in differences among the groups emerge more clearly, with the CPG group that performed best in nearly all domains and maintained almost perfect alignment between its completeness and accuracy. Conclusions: This hands-on session provided valuable insights into the potential perks and associated pitfalls of LLMs in medical education and practice. It suggested the critical need to include teachings in medical degree courses on how to properly take advantage of LLMs, as the potential for misuse is evident and real. UR - https://mededu.jmir.org/2025/1/e55709 UR - http://dx.doi.org/10.2196/55709 ID - info:doi/10.2196/55709 ER - TY - JOUR AU - ?uvalov, Hendrik AU - Lepson, Mihkel AU - Kukk, Veronika AU - Malk, Maria AU - Ilves, Neeme AU - Kuulmets, Hele-Andra AU - Kolde, Raivo PY - 2025/3/18 TI - Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study JO - J Med Internet Res SP - e66279 VL - 27 KW - natural language processing KW - named entity recognition KW - large language model KW - synthetic data KW - LLM KW - NLP KW - machine learning KW - artificial intelligence KW - language model KW - NER KW - medical entity KW - Estonian KW - health care data KW - annotated data KW - data annotation KW - clinical decision support KW - data mining N2 - Background: Named entity recognition (NER) plays a vital role in extracting critical medical entities from health care records, facilitating applications such as clinical decision support and data mining. Developing robust NER models for low-resource languages, such as Estonian, remains a challenge due to the scarcity of annotated data and domain-specific pretrained models. Large language models (LLMs) have proven to be promising in understanding text from any language or domain. Objective: This study addresses the development of medical NER models for low-resource languages, specifically Estonian. We propose a novel approach by generating synthetic health care data and using LLMs to annotate them. These synthetic data are then used to train a high-performing NER model, which is applied to real-world medical texts, preserving patient data privacy. Methods: Our approach to overcoming the shortage of annotated Estonian health care texts involves a three-step pipeline: (1) synthetic health care data are generated using a locally trained GPT-2 model on Estonian medical records, (2) the synthetic data are annotated with LLMs, specifically GPT-3.5-Turbo and GPT-4, and (3) the annotated synthetic data are then used to fine-tune an NER model, which is later tested on real-world medical data. This paper compares the performance of different prompts; assesses the impact of GPT-3.5-Turbo, GPT-4, and a local LLM; and explores the relationship between the amount of annotated synthetic data and model performance. Results: The proposed methodology demonstrates significant potential in extracting named entities from real-world medical texts. Our top-performing setup achieved an F1-score of 0.69 for drug extraction and 0.38 for procedure extraction. These results indicate a strong performance in recognizing certain entity types while highlighting the complexity of extracting procedures. Conclusions: This paper demonstrates a successful approach to leveraging LLMs for training NER models using synthetic data, effectively preserving patient privacy. By avoiding reliance on human-annotated data, our method shows promise in developing models for low-resource languages, such as Estonian. Future work will focus on refining the synthetic data generation and expanding the method?s applicability to other domains and languages. UR - https://www.jmir.org/2025/1/e66279 UR - http://dx.doi.org/10.2196/66279 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66279 ER - TY - JOUR AU - Haegens, L. Lex AU - Huiskes, B. Victor J. AU - van den Bemt, F. Bart J. AU - Bekker, L. Charlotte PY - 2025/3/13 TI - Factors Influencing the Intentions of Patients With Inflammatory Rheumatic Diseases to Use a Digital Human for Medication Information: Qualitative Study JO - J Med Internet Res SP - e57697 VL - 27 KW - digital human KW - information provision KW - intention to use KW - qualitative study KW - focus groups KW - drug-related problems KW - medication safety KW - safety information KW - information seeking KW - Netherlands KW - Pharmacotherapy KW - medication KW - telehealth KW - communication technologies KW - medication information KW - rheumatic diseases KW - rheumatology N2 - Background: Introduction: Patients with inflammatory rheumatic diseases (IRDs) frequently experience drug-related problems (DRPs). DRPs can have negative health consequences and should be addressed promptly to prevent complications. A digital human, which is an embodied conversational agent, could provide medication-related information in a time- and place-independent manner to support patients in preventing and decreasing DRPs. Objective: This study aims to identify factors that influence the intention of patients with IRDs to use a digital human to retrieve medication-related information. Methods: A qualitative study with 3 in-person focus groups was conducted among adult patients diagnosed with an IRD in the Netherlands. The prototype of a digital human is an innovative tool that provides spoken answers to medication-related questions and provides information linked to the topic, such as (instructional) videos, drug leaflets, and other relevant sources. Before the focus group, participants completed a preparatory exercise at home to become familiar with the digital human. A semistructured interview guide based on the Proctor framework for implementation determinants was used to interview participants about the acceptability, adoption, appropriateness, costs, feasibility, fidelity, penetration, and sustainability of the digital human. Focus groups were recorded, transcribed, and analyzed thematically. Results: The participants included 22 patients, with a median age of 68 (IQR 52-75) years, of whom 64% (n=22) were female. In total, 6 themes describing factors influencing patients? intention to use a digital human were identified: (1) the degree to which individual needs for medication-related information are met; (2) confidence in one?s ability to use the digital human; (3) the degree to which using the digital human resembles interacting with a human; (4) technical functioning of the digital human; (5) privacy and security; and (6) expected benefit of using the digital human. Conclusions: The intention of patients with IRDs to use a novel digital human to retrieve medication-related information was influenced by factors related to each patient?s information needs and confidence in their ability to use the digital human, features of the digital human, and the expected benefits of using the digital human. These identified themes should be considered during the further development of the digital human and during implementation to increase intention to use and future adoption. Thereafter, the effect of applying a digital human as an instrument to improve patients? self-management regarding DRPs could be researched. UR - https://www.jmir.org/2025/1/e57697 UR - http://dx.doi.org/10.2196/57697 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57697 ER - TY - JOUR AU - Hanna, J. John AU - Wakene, D. Abdi AU - Johnson, O. Andrew AU - Lehmann, U. Christoph AU - Medford, J. Richard PY - 2025/3/13 TI - Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care?Related Tasks: Cross-Sectional Study JO - J Med Internet Res SP - e57257 VL - 27 KW - sentiment analysis KW - racism KW - bias KW - artificial intelligence KW - reading ease KW - word frequency KW - large language models KW - text generation KW - healthcare KW - task KW - ChatGPT KW - cross sectional KW - consumer-directed KW - human immunodeficiency virus N2 - Background: Racial and ethnic bias in large language models (LLMs) used for health care tasks is a growing concern, as it may contribute to health disparities. In response, LLM operators implemented safeguards against prompts that are overtly seeking certain biases. Objective: This study aims to investigate a potential racial and ethnic bias among 4 popular LLMs: GPT-3.5-turbo (OpenAI), GPT-4 (OpenAI), Gemini-1.0-pro (Google), and Llama3-70b (Meta) in generating health care consumer?directed text in the absence of overtly biased queries. Methods: In this cross-sectional study, the 4 LLMs were prompted to generate discharge instructions for patients with HIV. Each patient?s encounter deidentified metadata including race/ethnicity as a variable was passed over in a table format through a prompt 4 times, altering only the race/ethnicity information (African American, Asian, Hispanic White, and non-Hispanic White) each time, while keeping all other information constant. The prompt requested the model to write discharge instructions for each encounter without explicitly mentioning race or ethnicity. The LLM-generated instructions were analyzed for sentiment, subjectivity, reading ease, and word frequency by race/ethnicity. Results: The only observed statistically significant difference between race/ethnicity groups was found in entity count (GPT-4, df=42, P=.047). However, post hoc chi-square analysis for GPT-4?s entity counts showed no significant pairwise differences among race/ethnicity categories after Bonferroni correction. Conclusions: A total of 4 LLMs were relatively invariant to race/ethnicity in terms of linguistic and readability measures. While our study used proxy linguistic and readability measures to investigate racial and ethnic bias among 4 LLM responses in a health care?related task, there is an urgent need to establish universally accepted standards for measuring bias in LLM-generated responses. Further studies are needed to validate these results and assess their implications. UR - https://www.jmir.org/2025/1/e57257 UR - http://dx.doi.org/10.2196/57257 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57257 ER - TY - JOUR AU - Wolfe, H. Brooke AU - Oh, Jung Yoo AU - Choung, Hyesun AU - Cui, Xiaoran AU - Weinzapfel, Joshua AU - Cooper, Amanda R. AU - Lee, Hae-Na AU - Lehto, Rebecca PY - 2025/3/13 TI - Caregiving Artificial Intelligence Chatbot for Older Adults and Their Preferences, Well-Being, and Social Connectivity: Mixed-Method Study JO - J Med Internet Res SP - e65776 VL - 27 KW - older adults KW - technology use KW - AI chatbots KW - artificial intelligence KW - well-being KW - social connectedness KW - mobile phone N2 - Background: The increasing number of older adults who are living alone poses challenges for maintaining their well-being, as they often need support with daily tasks, health care services, and social connections. However, advancements in artificial intelligence (AI) technologies have revolutionized health care and caregiving through their capacity to monitor health, provide medication and appointment reminders, and provide companionship to older adults. Nevertheless, the adaptability of these technologies for older adults is stymied by usability issues. This study explores how older adults use and adapt to AI technologies, highlighting both the persistent barriers and opportunities for potential enhancements. Objective: This study aimed to provide deeper insights into older adults? engagement with technology and AI. The technologies currently used, potential technologies desired for daily life integration, personal technology concerns faced, and overall attitudes toward technology and AI are explored. Methods: Using mixed methods, participants (N=28) completed both a semistructured interview and surveys consisting of health and well-being measures. Participants then participated in a research team?facilitated interaction with an AI chatbot, Amazon Alexa. Interview transcripts were analyzed using thematic analysis, and surveys were evaluated using descriptive statistics. Results: Participants? average age was 71 years (ranged from 65 years to 84 years). Most participants were familiar with technology use, especially using smartphones (26/28, 93%) and desktops and laptops (21/28, 75%). Participants rated appointment reminders (25/28, 89%), emergency assistance (22/28, 79%), and health monitoring (21/28, 75%). Participants rated appointment reminders (25/28, 89.3%), emergency assistance (22/28, 78.6%), and health monitoring (21/28, 75%) as the most desirable features of AI chatbots for adoption. Digital devices were commonly used for entertainment, health management, professional productivity, and social connectivity. Participants were most interested in integrating technology into their personal lives for scheduling reminders, chore assistance, and providing care to others. Challenges in using new technology included a commitment to learning new technologies, concerns about lack of privacy, and worries about future technology dependence. Overall, older adults? attitudes coalesced into 3 orientations, which we label as technology adapters, technologically wary, and technology resisters. These results illustrate that not all older adults were resistant to technology and AI. Instead, older adults are aligned with categories on a spectrum between willing, hesitant but willing, and unwilling to use technology and AI. Researchers can use these findings by asking older adults about their orientation toward technology to facilitate the integration of new technologies with each person?s comfortability and preferences. Conclusions: To ensure that AI technologies effectively support older adults, it is essential to foster an ongoing dialogue among developers, older adults, families, and their caregivers, focusing on inclusive designs to meet older adults? needs. UR - https://www.jmir.org/2025/1/e65776 UR - http://dx.doi.org/10.2196/65776 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65776 ER - TY - JOUR AU - Trivedi, Ritu AU - Shaw, Tim AU - Sheahen, Brodie AU - Chow, K. Clara AU - Laranjo, Liliana PY - 2025/3/12 TI - Patient Perspectives on Conversational Artificial Intelligence for Atrial Fibrillation Self-Management: Qualitative Analysis JO - J Med Internet Res SP - e64325 VL - 27 KW - atrial fibrillation KW - conversational agents KW - qualitative research KW - self-management KW - digital health KW - patient perspective KW - conversational artificial intelligence KW - speech recognition N2 - Background: Conversational artificial intelligence (AI) allows for engaging interactions, however, its acceptability, barriers, and enablers to support patients with atrial fibrillation (AF) are unknown. Objective: This work stems from the Coordinating Health care with AI?supported Technology for patients with AF (CHAT-AF) trial and aims to explore patient perspectives on receiving support from a conversational AI support program. Methods: Patients with AF recruited for a randomized controlled trial who received the intervention were approached for semistructured interviews using purposive sampling. The 6-month intervention consisted of fully automated conversational AI phone calls (with speech recognition and natural language processing) that assessed patient health and provided self-management support and education. Interviews were recorded, transcribed, and thematically analyzed. Results: We conducted 30 interviews (mean age 65.4, SD 11.9 years; 21/30, 70% male). Four themes were identified: (1) interaction with a voice-based conversational AI program (human-like interactions, restriction to prespecified responses, trustworthiness of hospital-delivered conversational AI); (2) engagement is influenced by the personalization of content, delivery mode, and frequency (tailoring to own health context, interest in novel information regarding health, overwhelmed with large volumes of information, flexibility provided by multichannel delivery); (3) improving access to AF care and information (continuity in support, enhancing access to health-related information); (4) empowering patients to better self-manage their AF (encouraging healthy habits through frequent reminders, reassurance from rhythm-monitoring devices). Conclusions: Although conversational AI was described as an engaging way to receive education and self-management support, improvements such as enhanced dialogue flexibility to allow for more naturally flowing conversations and tailoring to patient health context were also mentioned. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12621000174886; https://tinyurl.com/3nn7tk72 International Registered Report Identifier (IRRID): RR2-10.2196/34470 UR - https://www.jmir.org/2025/1/e64325 UR - http://dx.doi.org/10.2196/64325 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/64325 ER - TY - JOUR AU - Monzon, Noahlana AU - Hays, Alan Franklin PY - 2025/3/11 TI - Leveraging Generative Artificial Intelligence to Improve Motivation and Retrieval in Higher Education Learners JO - JMIR Med Educ SP - e59210 VL - 11 KW - educational technology KW - retrieval practice KW - flipped classroom KW - cognitive engagement KW - personalized learning KW - generative artificial intelligence KW - higher education KW - university education KW - learners KW - instructors KW - curriculum structure KW - learning KW - technologies KW - innovation KW - academic misconduct KW - gamification KW - self-directed KW - socio-economic disparities KW - interactive approach KW - medical education KW - chatGPT KW - machine learning KW - AI KW - large language models UR - https://mededu.jmir.org/2025/1/e59210 UR - http://dx.doi.org/10.2196/59210 ID - info:doi/10.2196/59210 ER - TY - JOUR AU - Benaïche, Alexandre AU - Billaut-Laden, Ingrid AU - Randriamihaja, Herivelo AU - Bertocchio, Jean-Philippe PY - 2025/3/10 TI - Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study JO - J Med Internet Res SP - e65651 VL - 27 KW - MyGenAssist KW - large language model KW - artificial intelligence KW - ChatGPT KW - pharmacovigilance KW - efficiency N2 - Background: At the end of 2023, Bayer AG launched its own internal large language model (LLM), MyGenAssist, based on ChatGPT technology to overcome data privacy concerns. It may offer the possibility to decrease their harshness and save time spent on repetitive and recurrent tasks that could then be dedicated to activities with higher added value. Although there is a current worldwide reflection on whether artificial intelligence should be integrated into pharmacovigilance, medical literature does not provide enough data concerning LLMs and their daily applications in such a setting. Here, we studied how this tool could improve the case documentation process, which is a duty for authorization holders as per European and French good vigilance practices. Objective: The aim of the study is to test whether the use of an LLM could improve the pharmacovigilance documentation process. Methods: MyGenAssist was trained to draft templates for case documentation letters meant to be sent to the reporters. Information provided within the template changes depending on the case: such data come from a table sent to the LLM. We then measured the time spent on each case for a period of 4 months (2 months before using the tool and 2 months after its implementation). A multiple linear regression model was created with the time spent on each case as the explained variable, and all parameters that could influence this time were included as explanatory variables (use of MyGenAssist, type of recipient, number of questions, and user). To test if the use of this tool impacts the process, we compared the recipients? response rates with and without the use of MyGenAssist. Results: An average of 23.3% (95% CI 13.8%-32.8%) of time saving was made thanks to MyGenAssist (P<.001; adjusted R2=0.286) on each case, which could represent an average of 10.7 (SD 3.6) working days saved each year. The answer rate was not modified by the use of MyGenAssist (20/48, 42% vs 27/74, 36%; P=.57) whether the recipient was a physician or a patient. No significant difference was found regarding the time spent by the recipient to answer (mean 2.20, SD 3.27 days vs mean 2.65, SD 3.30 days after the last attempt of contact; P=.64). The implementation of MyGenAssist for this activity only required a 2-hour training session for the pharmacovigilance team. Conclusions: Our study is the first to show that a ChatGPT-based tool can improve the efficiency of a good practice activity without needing a long training session for the affected workforce. These first encouraging results could be an incentive for the implementation of LLMs in other processes. UR - https://www.jmir.org/2025/1/e65651 UR - http://dx.doi.org/10.2196/65651 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65651 ER - TY - JOUR AU - Chen, Chen AU - Lam, Tai Kok AU - Yip, Man Ka AU - So, Kwan Hung AU - Lum, Sang Terry Yat AU - Wong, Kei Ian Chi AU - Yam, C. Jason AU - Chui, Ling Celine Sze AU - Ip, Patrick PY - 2025/3/6 TI - Comparison of an AI Chatbot With a Nurse Hotline in Reducing Anxiety and Depression Levels in the General Population: Pilot Randomized Controlled Trial JO - JMIR Hum Factors SP - e65785 VL - 12 KW - AI chatbot KW - anxiety KW - depression KW - effectiveness KW - artificial intelligence N2 - Background: Artificial intelligence (AI) chatbots have been customized to deliver on-demand support for people with mental health problems. However, the effectiveness of AI chatbots in tackling mental health problems among the general public in Hong Kong remains unclear. Objective: This study aimed to develop a local AI chatbot and compare the effectiveness of the AI chatbot with a conventional nurse hotline in reducing the level of anxiety and depression among individuals in Hong Kong. Methods: This study was a pilot randomized controlled trial conducted from October 2022 to March 2023, involving 124 participants allocated randomly (1:1 ratio) into the AI chatbot and nurse hotline groups. Among these, 62 participants in the AI chatbot group and 41 in the nurse hotline group completed both the pre- and postquestionnaires, including the GAD-7 (Generalized Anxiety Disorder Scale-7), PHQ-9 (Patient Health Questionnaire-9), and satisfaction questionnaire. Comparisons were conducted using independent and paired sample t tests (2-tailed) and the ?2 test to analyze changes in anxiety and depression levels. Results: Compared to the mean baseline score of 5.13 (SD 4.623), the mean postdepression score in the chatbot group was 3.68 (SD 4.397), which was significantly lower (P=.008). Similarly, a reduced anxiety score was also observed after the chatbot test (pre vs post: mean 4.74, SD 4.742 vs mean 3.4, SD 3.748; P=.005), respectively. No significant differences were found in the pre-post scores for either depression (P=.38) or anxiety (P=.19). No statistically significant difference was observed in service satisfaction between the two platforms (P=.32). Conclusions: The AI chatbot was comparable to the traditional nurse hotline in alleviating participants? anxiety and depression after responding to inquiries. Moreover, the AI chatbot has shown potential in alleviating short-term anxiety and depression compared to the nurse hotline. While the AI chatbot presents a promising solution for offering accessible strategies to the public, more extensive randomized controlled studies are necessary to further validate its effectiveness. Trial Registration: ClinicalTrials.gov NCT06621134; https://clinicaltrials.gov/study/NCT06621134 UR - https://humanfactors.jmir.org/2025/1/e65785 UR - http://dx.doi.org/10.2196/65785 ID - info:doi/10.2196/65785 ER - TY - JOUR AU - Uddin, Jamal AU - Feng, Cheng AU - Xu, Junfang PY - 2025/3/6 TI - Health Communication on the Internet: Promoting Public Health and Exploring Disparities in the Generative AI Era JO - J Med Internet Res SP - e66032 VL - 27 KW - internet KW - generative AI KW - artificial intelligence KW - ChatGPT KW - health communication KW - health promotion KW - health disparity KW - health KW - communication KW - AI KW - generative KW - tool KW - genAI KW - gratification theory KW - gratification KW - public health KW - inequity KW - disparity UR - https://www.jmir.org/2025/1/e66032 UR - http://dx.doi.org/10.2196/66032 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053755 ID - info:doi/10.2196/66032 ER - TY - JOUR AU - McBain, K. Ryan AU - Cantor, H. Jonathan AU - Zhang, Ang Li AU - Baker, Olesya AU - Zhang, Fang AU - Halbisen, Alyssa AU - Kofner, Aaron AU - Breslau, Joshua AU - Stein, Bradley AU - Mehrotra, Ateev AU - Yu, Hao PY - 2025/3/5 TI - Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study JO - J Med Internet Res SP - e67891 VL - 27 KW - depression KW - suicide KW - mental health KW - large language model KW - chatbot KW - digital health KW - Suicidal Ideation Response Inventory KW - ChatGPT KW - suicidologist KW - artificial intelligence N2 - Background: With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support. Objective: The objective of this study was to assess the competency of 3 widely used LLMs to distinguish appropriate versus inappropriate responses when engaging individuals who exhibit suicidal ideation. Methods: This observational, cross-sectional study evaluated responses to the revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Data collection and analyses were conducted in July 2024. A common training module for mental health professionals, SIRI-2 provides 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by two clinician responses. Clinician responses were scored from ?3 (highly inappropriate) to +3 (highly appropriate). All 3 LLMs were provided with a standardized set of instructions to rate clinician responses. We compared LLM responses to those of expert suicidologists, conducting linear regression analyses and converting LLM responses to z scores to identify outliers (z score>1.96 or .05). Satisfaction with reassurance-call averaged 4.14 (SD 0.66; group 1) and 4.54 (SD 0.52; group 2), with no significant difference between AI and humans (P=.11). AI-assisted reassurance-call satisfaction averaged 3.43 (SD 0.94). Satisfaction about the management of complications using the Likert scale averaged 3.79 (SD 0.70) and 4.23 (SD 0.83), respectively, showing no significant difference (P=.14), but a significant difference was observed when using the VAS (P=.01), with 6.64 (SD 2.13) in group 1 and 8.69 (SD 1.80) in group 2. Anxiety about complications using the State-Trait Anxiety Inventory averaged 36.43 (SD 9.17) and 39.23 (SD 8.51; P=.33), while anxiety assessed with VAS averaged 4.86 (SD 2.28) and 3.46 (SD 3.38; P=.18), respectively, showing no significant differences. Multiple regression analysis was performed on all outcomes, and humans showed superior satisfaction than AI in the management of complications. Otherwise, most of the other variables showed no significant differences (P.>05). Conclusions: This is the first study to use AI for patient reassurance regarding complications after ureteric stent placement. The study found that patients were similarly satisfied for reassurance calls conducted by AI or humans. Further research in larger populations is warranted to confirm these findings. Trial Registration: Clinical Research Information System KCT0008062; https://tinyurl.com/4s8725w2 UR - https://www.jmir.org/2025/1/e56039 UR - http://dx.doi.org/10.2196/56039 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56039 ER - TY - JOUR AU - Bousquet, Cedric AU - Beltramin, Divà PY - 2025/1/20 TI - Advantages and Inconveniences of a Multi-Agent Large Language Model System to Mitigate Cognitive Biases in Diagnostic Challenges JO - J Med Internet Res SP - e69742 VL - 27 KW - large language model KW - multi-agent system KW - diagnostic errors KW - cognition KW - clinical decision-making KW - cognitive bias KW - generative artificial intelligence UR - https://www.jmir.org/2025/1/e69742 UR - http://dx.doi.org/10.2196/69742 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/69742 ER - TY - JOUR AU - Sezgin, Emre AU - Kocaballi, Baki Ahmet PY - 2025/1/20 TI - Era of Generalist Conversational Artificial Intelligence to Support Public Health Communications JO - J Med Internet Res SP - e69007 VL - 27 KW - messaging apps KW - public health communication KW - language models KW - artificial intelligence KW - AI KW - generative AI KW - conversational AI UR - https://www.jmir.org/2025/1/e69007 UR - http://dx.doi.org/10.2196/69007 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/69007 ER - TY - JOUR AU - Sanchez Ortuño, Montserrat María AU - Pecune, Florian AU - Coelho, Julien AU - Micoulaud-Franchi, Arthur Jean AU - Salles, Nathalie AU - Auriacombe, Marc AU - Serre, Fuschia AU - Levavasseur, Yannick AU - De Sevin, Etienne AU - Sagaspe, Patricia AU - Philip, Pierre PY - 2025/1/15 TI - Determinants of Dropout From a Virtual Agent?Based App for Insomnia Management in a Self-Selected Sample of Users With Insomnia Symptoms: Longitudinal Study JO - JMIR Ment Health SP - e51022 VL - 12 KW - insomnia KW - digital behavioral therapy KW - mobile health KW - dropout KW - virtual agent?based app KW - virtual agent KW - user KW - digital intervention KW - smartphone KW - mental health KW - implementation KW - cognitive behavioral therapy KW - CBT N2 - Background: Fully automated digital interventions delivered via smartphone apps have proven efficacious for a wide variety of mental health outcomes. An important aspect is that they are accessible at a low cost, thereby increasing their potential public impact and reducing disparities. However, a major challenge to their successful implementation is the phenomenon of users dropping out early. Objective: The purpose of this study was to pinpoint the factors influencing early dropout in a sample of self-selected users of a virtual agent (VA)?based behavioral intervention for managing insomnia, named KANOPEE, which is freely available in France. Methods: From January 2021 to December 2022, of the 9657 individuals, aged 18 years or older, who downloaded and completed the KANOPEE screening interview and had either subclinical or clinical insomnia symptoms, 4295 (44.5%) dropped out (ie, did not return to the app to continue filling in subsequent assessments). The primary outcome was a binary variable: having dropped out after completing the screening assessment (early dropout) or having completed all the treatment phases (n=551). Multivariable logistic regression analysis was used to identify predictors of dropout among a set of sociodemographic, clinical, and sleep diary variables, and users? perceptions of the treatment program, collected during the screening interview. Results: The users? mean age was 47.95 (SD 15.21) years. Of those who dropped out early and those who completed the treatment, 65.1% (3153/4846) were women and 34.9% (1693/4846) were men. Younger age (adjusted odds ratio [AOR] 0.98, 95% CI 0.97?0.99), lower education level (compared to middle school; high school: AOR 0.56, 95% CI 0.35?0.90; bachelor?s degree: AOR 0.35, 95% CI 0.23?0.52; master?s degree or higher: AOR 0.35, 95% CI 0.22?0.55), poorer nocturnal sleep (sleep efficiency: AOR 0.64, 95% CI 0.42?0.96; number of nocturnal awakenings: AOR 1.13, 95% CI 1.04?1.23), and more severe depression symptoms (AOR 1.12, 95% CI 1.04?1.21) were significant predictors of dropping out. When measures of perceptions of the app were included in the model, perceived benevolence and credibility of the VA decreased the odds of dropout (AOR 0.91, 95%?CI 0.85?0.97). Conclusions: As in traditional face-to-face cognitive behavioral therapy for insomnia, the presence of significant depression symptoms plays an important role in treatment dropout. This variable represents an important target to address to increase early engagement with fully automated insomnia management programs. Furthermore, our results support the contention that a VA can provide relevant user stimulation that will eventually pay out in terms of user engagement. Trial Registration: ClinicalTrials.gov NCT05074901; https://clinicaltrials.gov/study/NCT05074901?a=1 UR - https://mental.jmir.org/2025/1/e51022 UR - http://dx.doi.org/10.2196/51022 ID - info:doi/10.2196/51022 ER - TY - JOUR AU - Kim, Myungsung AU - Lee, Seonmi AU - Kim, Sieun AU - Heo, Jeong-in AU - Lee, Sangil AU - Shin, Yu-Bin AU - Cho, Chul-Hyun AU - Jung, Dooyoung PY - 2025/1/14 TI - Therapeutic Potential of Social Chatbots in Alleviating Loneliness and Social Anxiety: Quasi-Experimental Mixed Methods Study JO - J Med Internet Res SP - e65589 VL - 27 KW - artificial intelligence KW - AI KW - social chatbot KW - loneliness KW - social anxiety KW - exploratory research KW - mixed methods study N2 - Background: Artificial intelligence (AI) social chatbots represent a major advancement in merging technology with mental health, offering benefits through natural and emotional communication. Unlike task-oriented chatbots, social chatbots build relationships and provide social support, which can positively impact mental health outcomes like loneliness and social anxiety. However, the specific effects and mechanisms through which these chatbots influence mental health remain underexplored. Objective: This study explores the mental health potential of AI social chatbots, focusing on their impact on loneliness and social anxiety among university students. The study seeks to (i) assess the impact of engaging with an AI social chatbot in South Korea, "Luda Lee," on these mental health outcomes over a 4-week period and (ii) analyze user experiences to identify perceived strengths and weaknesses, as well as the applicability of social chatbots in therapeutic contexts. Methods: A single-group pre-post study was conducted with university students who interacted with the chatbot for 4 weeks. Measures included loneliness, social anxiety, and mood-related symptoms such as depression, assessed at baseline, week 2, and week 4. Quantitative measures were analyzed using analysis of variance and stepwise linear regression to identify the factors affecting change. Thematic analysis was used to analyze user experiences and assess the perceived benefits and challenges of chatbots. Results: A total of 176 participants (88 males, average age=22.6 (SD 2.92)) took part in the study. Baseline measures indicated slightly elevated levels of loneliness (UCLA Loneliness Scale, mean 27.97, SD (11.07)) and social anxiety (Liebowitz Social Anxiety Scale, mean 25.3, SD (14.19)) compared to typical university students. Significant reductions were observed as loneliness decreasing by week 2 (t175=2.55, P=.02) and social anxiety decreasing by week 4 (t175=2.67, P=.01). Stepwise linear regression identified baseline loneliness (?=0.78, 95% CI 0.67 to 0.89), self-disclosure (?=?0.65, 95% CI ?1.07 to ?0.23) and resilience (?=0.07, 95% CI 0.01 to 0.13) as significant predictors of week 4 loneliness (R2=0.64). Baseline social anxiety (?=0.92, 95% CI 0.81 to 1.03) significantly predicted week 4 anxiety (R2=0.65). These findings indicate higher baseline loneliness, lower self-disclosure to the chatbot, and higher resilience significantly predicted higher loneliness at week 4. Additionally, higher baseline social anxiety significantly predicted higher social anxiety at week 4. Qualitative analysis highlighted the chatbot's empathy and support as features for reliability, though issues such as inconsistent responses and excessive enthusiasm occasionally disrupted user immersion. Conclusions: Social chatbots may have the potential to mitigate feelings of loneliness and social anxiety, indicating their possible utility as complementary resources in mental health interventions. User insights emphasize the importance of empathy, accessibility, and structured conversations in achieving therapeutic goals. Trial Registration: Clinical Research Information Service (CRIS) KCT0009288; https://tinyurl.com/hxrznt3t UR - https://www.jmir.org/2025/1/e65589 UR - http://dx.doi.org/10.2196/65589 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65589 ER - TY - JOUR AU - Merkel, Sebastian AU - Schorr, Sabrina PY - 2025/1/13 TI - Identification of Use Cases, Target Groups, and Motivations Around Adopting Smart Speakers for Health Care and Social Care Settings: Scoping Review JO - JMIR AI SP - e55673 VL - 4 KW - conversational agents KW - smart speaker KW - health care KW - social care KW - digitalization KW - scoping review KW - mobile phone N2 - Background: Conversational agents (CAs) are finding increasing application in health and social care, not least due to their growing use in the home. Recent developments in artificial intelligence, machine learning, and natural language processing have enabled a variety of new uses for CAs. One type of CA that has received increasing attention recently is smart speakers. Objective: The aim of our study was to identify the use cases, user groups, and settings of smart speakers in health and social care. We also wanted to identify the key motivations for developers and designers to use this particular type of technology. Methods: We conducted a scoping review to provide an overview of the literature on smart speakers in health and social care. The literature search was conducted between February 2023 and March 2023 and included 3 databases (PubMed, Scopus, and Sociological Abstracts), supplemented by Google Scholar. Several keywords were used, including technology (eg, voice assistant), product name (eg, Amazon Alexa), and setting (health care or social care). Publications were included if they met the predefined inclusion criteria: (1) published after 2015 and (2) used a smart speaker in a health care or social care setting. Publications were excluded if they met one of the following criteria: (1) did not report on the specific devices used, (2) did not focus specifically on smart speakers, (3) were systematic reviews and other forms of literature-based publications, and (4) were not published in English. Two reviewers collected, reviewed, abstracted, and analyzed the data using qualitative content analysis. Results: A total of 27 articles were included in the final review. These articles covered a wide range of use cases in different settings, such as private homes, hospitals, long-term care facilities, and outpatient services. The main target group was patients, especially older users, followed by doctors and other medical staff members. Conclusions: The results show that smart speakers have diverse applications in health and social care, addressing different contexts and audiences. Their affordability and easy-to-use interfaces make them attractive to various stakeholders. It seems likely that, due to technical advances in artificial intelligence and the market power of the companies behind the devices, there will be more use cases for smart speakers in the near future. UR - https://ai.jmir.org/2025/1/e55673 UR - http://dx.doi.org/10.2196/55673 UR - http://www.ncbi.nlm.nih.gov/pubmed/39804689 ID - info:doi/10.2196/55673 ER - TY - JOUR AU - Zhang, Yong AU - Lu, Xiao AU - Luo, Yan AU - Zhu, Ying AU - Ling, Wenwu PY - 2025/1/9 TI - Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis JO - JMIR Med Inform SP - e63924 VL - 13 KW - chatbots KW - ChatGPT KW - ERNIE Bot KW - performance KW - accuracy rates KW - ultrasound KW - language KW - examination N2 - Background: Artificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. Objective: This study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. Methods: We curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. Results: Of the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P<.05). Both models showed a performance decline in English, but ERNIE Bot?s decline was less significant. The models performed better in terms of basic knowledge, ultrasound methods, and diseases than in terms of ultrasound signs and diagnosis. Conclusions: Chatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use. UR - https://medinform.jmir.org/2025/1/e63924 UR - http://dx.doi.org/10.2196/63924 ID - info:doi/10.2196/63924 ER - TY - JOUR AU - Chau, A. Courtney AU - Feng, Hao AU - Cobos, Gabriela AU - Park, Joyce PY - 2025/1/7 TI - The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses JO - JMIR Dermatol SP - e60827 VL - 8 KW - artificial intelligence KW - AI KW - ChatGPT KW - atopic dermatitis KW - acne vulgaris KW - cyst KW - actinic keratosis KW - rosacea KW - diagnosis KW - treatment KW - prognosis KW - dermatological KW - patient KW - chatbot KW - dermatologist UR - https://derma.jmir.org/2025/1/e60827 UR - http://dx.doi.org/10.2196/60827 ID - info:doi/10.2196/60827 ER - TY - JOUR AU - Cheng, Sheung-Tak AU - Ng, F. Peter H. PY - 2025/1/6 TI - The PDC30 Chatbot?Development of a Psychoeducational Resource on Dementia Caregiving Among Family Caregivers: Mixed Methods Acceptability Study JO - JMIR Aging SP - e63715 VL - 8 KW - Alzheimer KW - caregiving KW - chatbot KW - conversational artificial intelligence KW - dementia KW - digital health KW - health care technology KW - psychoeducational KW - medical innovations KW - language models KW - mobile phone N2 - Background: Providing ongoing support to the increasing number of caregivers as their needs change in the long-term course of dementia is a severe challenge to any health care system. Conversational artificial intelligence (AI) operating 24/7 may help to tackle this problem. Objective: This study describes the development of a generative AI chatbot?the PDC30 Chatbot?and evaluates its acceptability in a mixed methods study. Methods: The PDC30 Chatbot was developed using the GPT-4o large language model, with a personality agent to constrain its behavior to provide advice on dementia caregiving based on the Positive Dementia Caregiving in 30 Days Guidebook?a laypeople?s resource based on a validated training manual for dementia caregivers. The PDC30 Chatbot?s responses to 21 common questions were compared with those of ChatGPT and another chatbot (called Chatbot-B) as standards of reference. Chatbot-B was constructed using PDC30 Chatbot?s architecture but replaced the latter?s knowledge base with a collection of authoritative sources, including the World Health Organization?s iSupport, By Us For Us Guides, and 185 web pages or manuals by Alzheimer?s Association, National Institute on Aging, and UK Alzheimer?s Society. In the next phase, to assess the acceptability of the PDC30 Chatbot, 21 family caregivers used the PDC30 Chatbot for two weeks and provided ratings and comments on its acceptability. Results: Among the three chatbots, ChatGPT?s responses tended to be repetitive and not specific enough. PDC30 Chatbot and Chatbot-B, by virtue of their design, produced highly context-sensitive advice, with the former performing slightly better when the questions conveyed significant psychological distress on the part of the caregiver. In the acceptability study, caregivers found the PDC30 Chatbot highly user-friendly, and its responses quite helpful and easy to understand. They were rather satisfied with it and would strongly recommend it to other caregivers. During the 2-week trial period, the majority used the chatbot more than once per day. Thematic analysis of their written feedback revealed three major themes: helpfulness, accessibility, and improved attitude toward AI. Conclusions: The PDC30 Chatbot provides quality responses to caregiver questions, which are well-received by caregivers. Conversational AI is a viable approach to improve the support of caregivers. UR - https://aging.jmir.org/2025/1/e63715 UR - http://dx.doi.org/10.2196/63715 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63715 ER - TY - JOUR AU - Das, Sudeshna AU - Ge, Yao AU - Guo, Yuting AU - Rajwal, Swati AU - Hairston, JaMor AU - Powell, Jeanne AU - Walker, Drew AU - Peddireddy, Snigdha AU - Lakamana, Sahithi AU - Bozkurt, Selen AU - Reyna, Matthew AU - Sameni, Reza AU - Xiao, Yunyu AU - Kim, Sangmi AU - Chandler, Rasheeta AU - Hernandez, Natalie AU - Mowery, Danielle AU - Wightman, Rachel AU - Love, Jennifer AU - Spadaro, Anthony AU - Perrone, Jeanmarie AU - Sarker, Abeed PY - 2025/1/6 TI - Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study JO - J Med Internet Res SP - e66220 VL - 27 KW - retrieval-augmented generation KW - substance use KW - social media KW - large language models KW - natural language processing KW - artificial intelligence KW - GPT KW - psychoactive substance N2 - Background: The increasing use of social media to share lived and living experiences of substance use presents a unique opportunity to obtain information on side effects, use patterns, and opinions on novel psychoactive substances. However, due to the large volume of data, obtaining useful insights through natural language processing technologies such as large language models is challenging. Objective: This paper aims to develop a retrieval-augmented generation (RAG) architecture for medical question answering pertaining to clinicians? queries on emerging issues associated with health-related topics, using user-generated medical information on social media. Methods: We proposed a two-layer RAG framework for query-focused answer generation and evaluated a proof of concept for the framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. Our modular framework generates individual summaries followed by an aggregated summary to answer medical queries from large amounts of user-generated social media data in an efficient manner. We compared the performance of a quantized large language model (Nous-Hermes-2-7B-DPO), deployable in low-resource settings, with GPT-4. For this proof-of-concept study, we used user-generated data from Reddit to answer clinicians? questions on the use of xylazine and ketamine. Results: Our framework achieves comparable median scores in terms of relevance, length, hallucination, coverage, and coherence when evaluated using GPT-4 and Nous-Hermes-2-7B-DPO, evaluated for 20 queries with 76 samples. There was no statistically significant difference between GPT-4 and Nous-Hermes-2-7B-DPO for coverage (Mann-Whitney U=733.0; n1=37; n2=39; P=.89 two-tailed), coherence (U=670.0; n1=37; n2=39; P=.49 two-tailed), relevance (U=662.0; n1=37; n2=39; P=.15 two-tailed), length (U=672.0; n1=37; n2=39; P=.55 two-tailed), and hallucination (U=859.0; n1=37; n2=39; P=.01 two-tailed). A statistically significant difference was noted for the Coleman-Liau Index (U=307.5; n1=20; n2=16; P<.001 two-tailed). Conclusions: Our RAG framework can effectively answer medical questions about targeted topics and can be deployed in resource-constrained settings. UR - https://www.jmir.org/2025/1/e66220 UR - http://dx.doi.org/10.2196/66220 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66220 ER - TY - JOUR AU - Schafer, Moa AU - Lachman, Jamie AU - Zinser, Paula AU - Calderón Alfaro, Antonio Francisco AU - Han, Qing AU - Facciola, Chiara AU - Clements, Lily AU - Gardner, Frances AU - Haupt Ronnie, Genevieve AU - Sheil, Ross PY - 2025/1/3 TI - A Digital Parenting Intervention With Intimate Partner Violence Prevention Content: Quantitative Pre-Post Pilot Study JO - JMIR Form Res SP - e58611 VL - 9 KW - intimate partner violence KW - SMS text messaging KW - chatbot KW - user engagement KW - parenting KW - violence KW - mobile phone N2 - Background: Intimate partner violence (IPV) and violence against children are global issues with severe consequences. Intersections shared by the 2 forms of violence have led to calls for joint programming efforts to prevent both IPV and violence against children. Parenting programs have been identified as a key entry point for addressing multiple forms of family violence. Building on the IPV prevention material that has been integrated into the parenting program ParentText, a digital parenting chatbot, this pilot study seeks to explore parents? engagement with the IPV prevention content in ParentText and explore preliminary changes in IPV. Objective: This study aimed to assess parents? and caregivers? level of engagement with the IPV prevention material in the ParentText chatbot and explore preliminary changes in experiences and perpetration of IPV, attitudes toward IPV, and gender-equitable behaviors following the intervention. Methods: Caregivers of children aged between 0 and 18 years were recruited through convenience sampling by research assistants in Cape Town, South Africa, and by UNICEF (United Nations Children's Fund) Jamaica staff in 3 parishes of Jamaica. Quantitative data from women in Jamaica (n=28) and South Africa (n=19) and men in South Africa (n=21) were collected electronically via weblinks sent to caregivers? phones using Open Data Kit. The primary outcome was IPV experience (women) and perpetration (men), with secondary outcomes including gender-equitable behaviors and attitudes toward IPV. Descriptive statistics were used to report sociodemographic characteristics and engagement outcomes. Chi-square tests and 2-tailed paired dependent-sample t tests were used to investigate potential changes in IPV outcomes between pretest and posttest. Results: The average daily interaction rate with the program was 0.57 and 0.59 interactions per day for women and men in South Africa, and 0.21 for women in Jamaica. The rate of completion of at least 1 IPV prevention topic was 25% (5/20) for women and 5% (1/20) for men in South Africa, and 21% (6/28) for women in Jamaica. Exploratory analyses indicated significant pre-post reductions in overall IPV experience among women in South Africa (P=.01) and Jamaica (P=.01) and in men?s overall harmful IPV attitudes (P=.01) and increases in men?s overall gender-equitable behaviors (P=.02) in South Africa. Conclusions: To the best of our knowledge, this is the first pilot study to investigate user engagement with and indicative outcomes of a digital parenting intervention with integrated IPV prevention content. Study findings provide valuable insights into user interactions with the chatbot and shed light on challenges related to low levels of chatbot engagement. Indicative results suggest promising yet modest reductions in IPV and improvements in attitudes after the program. Further research using a randomized controlled trial is warranted to establish causality. UR - https://formative.jmir.org/2025/1/e58611 UR - http://dx.doi.org/10.2196/58611 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58611 ER - TY - JOUR AU - Kang, Boyoung AU - Hong, Munpyo PY - 2025/1/3 TI - Development and Evaluation of a Mental Health Chatbot Using ChatGPT 4.0: Mixed Methods User Experience Study With Korean Users JO - JMIR Med Inform SP - e63538 VL - 13 KW - mental health chatbot KW - Dr. CareSam KW - HoMemeTown KW - ChatGPT 4.0 KW - large language model KW - LLM KW - cross-lingual KW - pilot testing KW - cultural sensitivity KW - localization KW - Korean students N2 - Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence?driven solutions. Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr. CareSam, an advanced cross-lingual chatbot using ChatGPT 4.0 (OpenAI) to provide seamless support in both English and Korean contexts. The chatbot was designed to address the need for more personalized and culturally sensitive mental health support identified in our previous work while providing an accessible and user-friendly interface for Korean young adults. Methods: We conducted a mixed methods pilot study with 20 Korean young adults aged 18 to 27 (mean 23.3, SD 1.96) years. The HoMemeTown Dr CareSam chatbot was developed using the GPT application programming interface, incorporating features such as a gratitude journal and risk detection. User satisfaction and chatbot performance were evaluated using quantitative surveys and qualitative feedback, with triangulation used to ensure the validity and robustness of findings through cross-verification of data sources. Comparative analyses were conducted with other large language models chatbots and existing digital therapy tools (Woebot [Woebot Health Inc] and Happify [Twill Inc]). Results: Users generally expressed positive views towards the chatbot, with positivity and support receiving the highest score on a 10-point scale (mean 9.0, SD 1.2), followed by empathy (mean 8.7, SD 1.6) and active listening (mean 8.0, SD 1.8). However, areas for improvement were noted in professionalism (mean 7.0, SD 2.0), complexity of content (mean 7.4, SD 2.0), and personalization (mean 7.4, SD 2.4). The chatbot demonstrated statistically significant performance differences compared with other large language models chatbots (F=3.27; P=.047), with more pronounced differences compared with Woebot and Happify (F=12.94; P<.001). Qualitative feedback highlighted the chatbot?s strengths in providing empathetic responses and a user-friendly interface, while areas for improvement included response speed and the naturalness of Korean language responses. Conclusions: The HoMemeTown Dr CareSam chatbot shows potential as a cross-lingual mental health support tool, achieving high user satisfaction and demonstrating comparative advantages over existing digital interventions. However, the study?s limited sample size and short-term nature necessitate further research. Future studies should include larger-scale clinical trials, enhanced risk detection features, and integration with existing health care systems to fully realize its potential in supporting mental well-being across different linguistic and cultural contexts. UR - https://medinform.jmir.org/2025/1/e63538 UR - http://dx.doi.org/10.2196/63538 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63538 ER - TY - JOUR AU - Wang, Chenxu AU - Li, Shuhan AU - Lin, Nuoxi AU - Zhang, Xinyu AU - Han, Ying AU - Wang, Xiandi AU - Liu, Di AU - Tan, Xiaomei AU - Pu, Dan AU - Li, Kang AU - Qian, Guangwu AU - Yin, Rong PY - 2025/1/1 TI - Application of Large Language Models in Medical Training Evaluation?Using ChatGPT as a Standardized Patient: Multimetric Assessment JO - J Med Internet Res SP - e59435 VL - 27 KW - ChatGPT KW - artificial intelligence KW - standardized patient KW - health care KW - prompt engineering KW - accuracy KW - large language models KW - performance evaluation KW - medical training KW - inflammatory bowel disease N2 - Background: With the increasing interest in the application of large language models (LLMs) in the medical field, the feasibility of its potential use as a standardized patient in medical assessment is rarely evaluated. Specifically, we delved into the potential of using ChatGPT, a representative LLM, in transforming medical education by serving as a cost-effective alternative to standardized patients, specifically for history-taking tasks. Objective: The study aims to explore ChatGPT?s viability and performance as a standardized patient, using prompt engineering to refine its accuracy and use in medical assessments. Methods: A 2-phase experiment was conducted. The first phase assessed feasibility by simulating conversations about inflammatory bowel disease (IBD) across 3 quality groups (good, medium, and bad). Responses were categorized based on their relevance and accuracy. Each group consisted of 30 runs, with responses scored to determine whether they were related to the inquiries. For the second phase, we evaluated ChatGPT?s performance against specific criteria, focusing on its anthropomorphism, clinical accuracy, and adaptability. Adjustments were made to prompts based on ChatGPT?s response shortcomings, with a comparative analysis of ChatGPT?s performance between original and revised prompts. A total of 300 runs were conducted and compared against standard reference scores. Finally, the generalizability of the revised prompt was tested using other scripts for another 60 runs, together with the exploration of the impact of the used language on the performance of the chatbot. Results: The feasibility test confirmed ChatGPT?s ability to simulate a standardized patient effectively, differentiating among poor, medium, and good medical inquiries with varying degrees of accuracy. Score differences between the poor (74.7, SD 5.44) and medium (82.67, SD 5.30) inquiry groups (P<.001), between the poor and good (85, SD 3.27) inquiry groups (P<.001) were significant at a significance level (?) of .05, while the score differences between the medium and good inquiry groups were not statistically significant (P=.16). The revised prompt significantly improved ChatGPT?s realism, clinical accuracy, and adaptability, leading to a marked reduction in scoring discrepancies. The score accuracy of ChatGPT improved 4.926 times compared to unrevised prompts. The score difference percentage drops from 29.83% to 6.06%, with a drop in SD from 0.55 to 0.068. The performance of the chatbot on a separate script is acceptable with an average score difference percentage of 3.21%. Moreover, the performance differences between test groups using various language combinations were found to be insignificant. Conclusions: ChatGPT, as a representative LLM, is a viable tool for simulating standardized patients in medical assessments, with the potential to enhance medical training. By incorporating proper prompts, ChatGPT?s scoring accuracy and response realism significantly improved, approaching the feasibility of actual clinical use. Also, the influence of the adopted language is nonsignificant on the outcome of the chatbot. UR - https://www.jmir.org/2025/1/e59435 UR - http://dx.doi.org/10.2196/59435 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59435 ER - TY - JOUR AU - Li, Fan AU - Yang, Ya PY - 2024/12/24 TI - Impact of Artificial Intelligence?Generated Content Labels On Perceived Accuracy, Message Credibility, and Sharing Intentions for Misinformation: Web-Based, Randomized, Controlled Experiment JO - JMIR Form Res SP - e60024 VL - 8 KW - generative AI KW - artificial intelligence KW - ChatGPT KW - AIGC label KW - misinformation KW - perceived accuracy KW - message credibility KW - sharing intention KW - social media KW - health information N2 - Background: The proliferation of generative artificial intelligence (AI), such as ChatGPT, has added complexity and richness to the virtual environment by increasing the presence of AI-generated content (AIGC). Although social media platforms such as TikTok have begun labeling AIGC to facilitate the ability for users to distinguish it from human-generated content, little research has been performed to examine the effect of these AIGC labels. Objective: This study investigated the impact of AIGC labels on perceived accuracy, message credibility, and sharing intention for misinformation through a web-based experimental design, aiming to refine the strategic application of AIGC labels. Methods: The study conducted a 2×2×2 mixed experimental design, using the AIGC labels (presence vs absence) as the between-subjects factor and information type (accurate vs inaccurate) and content category (for-profit vs not-for-profit) as within-subjects factors. Participants, recruited via the Credamo platform, were randomly assigned to either an experimental group (with labels) or a control group (without labels). Each participant evaluated 4 sets of content, providing feedback on perceived accuracy, message credibility, and sharing intention for misinformation. Statistical analyses were performed using SPSS version 29 and included repeated-measures ANOVA and simple effects analysis, with significance set at P<.05. Results: As of April 2024, this study recruited a total of 957 participants, and after screening, 400 participants each were allocated to the experimental and control groups. The main effects of AIGC labels were not significant for perceived accuracy, message credibility, or sharing intention. However, the main effects of information type were significant for all 3 dependent variables (P<.001), as were the effects of content category (P<.001). There were significant differences in interaction effects among the 3 variables. For perceived accuracy, the interaction between information type and content category was significant (P=.005). For message credibility, the interaction between information type and content category was significant (P<.001). Regarding sharing intention, both the interaction between information type and content category (P<.001) and the interaction between information type and AIGC labels (P=.008) were significant. Conclusions: This study found that AIGC labels minimally affect perceived accuracy, message credibility, or sharing intention but help distinguish AIGC from human-generated content. The labels do not negatively impact users? perceptions of platform content, indicating their potential for fact-checking and governance. However, AIGC labeling applications should vary by information type; they can slightly enhance sharing intention and perceived accuracy for misinformation. This highlights the need for more nuanced strategies for AIGC labels, necessitating further research. UR - https://formative.jmir.org/2024/1/e60024 UR - http://dx.doi.org/10.2196/60024 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60024 ER - TY - JOUR AU - Kianian, Reza AU - Sun, Deyu AU - Rojas-Carabali, William AU - Agrawal, Rupesh AU - Tsui, Edmund PY - 2024/12/24 TI - Large Language Models May Help Patients Understand Peer-Reviewed Scientific Articles About Ophthalmology: Development and Usability Study JO - J Med Internet Res SP - e59843 VL - 26 KW - uveitis KW - artificial intelligence KW - ChatGPT KW - readability KW - peer review KW - large language models KW - LLMs KW - health literacy KW - patient education KW - medical information KW - ophthalmology N2 - Background: Adequate health literacy has been shown to be important for the general health of a population. To address this, it is recommended that patient-targeted medical information is written at a sixth-grade reading level. To make well-informed decisions about their health, patients may want to interact directly with peer-reviewed open access scientific articles. However, studies have shown that such text is often written with highly complex language above the levels that can be comprehended by the general population. Previously, we have published on the use of large language models (LLMs) in easing the readability of patient-targeted health information on the internet. In this study, we continue to explore the advantages of LLMs in patient education. Objective: This study aimed to explore the use of LLMs, specifically ChatGPT (OpenAI), to enhance the readability of peer-reviewed scientific articles in the field of ophthalmology. Methods: A total of 12 open access, peer-reviewed papers published by the senior authors of this study (ET and RA) were selected. Readability was assessed using the Flesch-Kincaid Grade Level and Simple Measure of Gobbledygook tests. ChatGPT 4.0 was asked ?I will give you the text of a peer-reviewed scientific paper. Considering that the recommended readability of the text is 6th grade, can you simplify the following text so that a layperson reading this text can fully comprehend it? - Insert Manuscript Text -?. Appropriateness was evaluated by the 2 uveitis-trained ophthalmologists. Statistical analysis was performed in Microsoft Excel. Results: ChatGPT significantly lowered the readability and length of the selected papers from 15th to 7th grade (P<.001) while generating responses that were deemed appropriate by expert ophthalmologists. Conclusions: LLMs show promise in improving health literacy by enhancing the accessibility of peer-reviewed scientific articles and allowing the general population to interact directly with medical literature. UR - https://www.jmir.org/2024/1/e59843 UR - http://dx.doi.org/10.2196/59843 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59843 ER - TY - JOUR AU - Miyazaki, Yuki AU - Hata, Masahiro AU - Omori, Hisaki AU - Hirashima, Atsuya AU - Nakagawa, Yuta AU - Eto, Mitsuhiro AU - Takahashi, Shun AU - Ikeda, Manabu PY - 2024/12/24 TI - Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions JO - JMIR Med Educ SP - e63129 VL - 10 KW - medical education KW - artificial intelligence KW - clinical decision-making KW - GPT-4o KW - medical licensing examination KW - Japan KW - images KW - accuracy KW - AI technology KW - application KW - decision-making KW - image-based KW - reliability KW - ChatGPT UR - https://mededu.jmir.org/2024/1/e63129 UR - http://dx.doi.org/10.2196/63129 ID - info:doi/10.2196/63129 ER - TY - JOUR AU - Hsu, Tien-Wei AU - Liang, Chih-Sung PY - 2024/12/23 TI - Authors? Reply: Reassessing AI in Medicine: Exploring the Capabilities of AI in Academic Abstract Synthesis JO - J Med Internet Res SP - e65123 VL - 26 KW - ChatGPT KW - AI-generated scientific content KW - plagiarism KW - AI KW - artificial intelligence KW - NLP KW - natural language processing KW - LLM KW - language model KW - text KW - textual KW - generation KW - generative KW - extract KW - extraction KW - scientific research KW - academic research KW - publication KW - abstract KW - comparative analysis KW - reviewer bias UR - https://www.jmir.org/2024/1/e65123 UR - http://dx.doi.org/10.2196/65123 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65123 ER - TY - JOUR AU - Wang, Zijian AU - Zhou, Chunyang PY - 2024/12/23 TI - Reassessing AI in Medicine: Exploring the Capabilities of AI in Academic Abstract Synthesis JO - J Med Internet Res SP - e55920 VL - 26 KW - ChatGPT KW - AI-generated scientific content KW - plagiarism KW - AI KW - artificial intelligence KW - NLP KW - natural language processing KW - LLM KW - language model KW - text KW - textual KW - generation KW - generative KW - extract KW - extraction KW - scientific research KW - academic research KW - publication KW - abstract KW - comparative analysis KW - reviewer bias UR - https://www.jmir.org/2024/1/e55920 UR - http://dx.doi.org/10.2196/55920 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55920 ER - TY - JOUR AU - Sprint, Gina AU - Schmitter-Edgecombe, Maureen AU - Cook, Diane PY - 2024/12/23 TI - Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation JO - JMIR Form Res SP - e63866 VL - 8 KW - human digital twin KW - cognitive health KW - cognitive diagnosis KW - large language models KW - artificial intelligence KW - machine learning KW - digital behavior marker KW - interview marker KW - health information KW - chatbot KW - digital twin KW - smartwatch N2 - Background: Human digital twins have the potential to change the practice of personalizing cognitive health diagnosis because these systems can integrate multiple sources of health information and influence into a unified model. Cognitive health is multifaceted, yet researchers and clinical professionals struggle to align diverse sources of information into a single model. Objective: This study aims to introduce a method called HDTwin, for unifying heterogeneous data using large language models. HDTwin is designed to predict cognitive diagnoses and offer explanations for its inferences. Methods: HDTwin integrates cognitive health data from multiple sources, including demographic, behavioral, ecological momentary assessment, n-back test, speech, and baseline experimenter testing session markers. Data are converted into text prompts for a large language model. The system then combines these inputs with relevant external knowledge from scientific literature to construct a predictive model. The model?s performance is validated using data from 3 studies involving 124 participants, comparing its diagnostic accuracy with baseline machine learning classifiers. Results: HDTwin achieves a peak accuracy of 0.81 based on the automated selection of markers, significantly outperforming baseline classifiers. On average, HDTwin yielded accuracy=0.77, precision=0.88, recall=0.63, and Matthews correlation coefficient=0.57. In comparison, the baseline classifiers yielded average accuracy=0.65, precision=0.86, recall=0.35, and Matthews correlation coefficient=0.36. The experiments also reveal that HDTwin yields superior predictive accuracy when information sources are fused compared to single sources. HDTwin?s chatbot interface provides interactive dialogues, aiding in diagnosis interpretation and allowing further exploration of patient data. Conclusions: HDTwin integrates diverse cognitive health data, enhancing the accuracy and explainability of cognitive diagnoses. This approach outperforms traditional models and provides an interface for navigating patient information. The approach shows promise for improving early detection and intervention strategies in cognitive health. UR - https://formative.jmir.org/2024/1/e63866 UR - http://dx.doi.org/10.2196/63866 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63866 ER - TY - JOUR AU - Ruggiano, Nicole AU - Brown, Leslie Ellen AU - Clarke, J. Peter AU - Hristidis, Vagelis AU - Roberts, Lisa AU - Framil Suarez, Victoria Carmen AU - Allala, Chaithra Sai AU - Hurley, Shannon AU - Kopcsik, Chrystine AU - Daquin, Jane AU - Chevez, Hamilton AU - Chang-Lau, Raymond AU - Agronin, Marc AU - Geldmacher, S. David PY - 2024/12/23 TI - An Evidence-Based IT Program With Chatbot to Support Caregiving and Clinical Care for People With Dementia: The CareHeroes Development and Usability Pilot JO - JMIR Aging SP - e57308 VL - 7 KW - Alzheimer disease KW - artificial intelligence KW - caregivers KW - chatbot KW - dementia KW - mobile applications KW - conversational agent KW - design KW - apps N2 - Background: There are numerous communication barriers between family caregivers and providers of people living with dementia, which can pose challenges to caregiving and clinical decision-making. To address these barriers, a new web and mobile-enabled app, called CareHeroes, was developed, which promotes the collection and secured sharing of clinical information between caregivers and providers. It also provides caregiver support and education. Objective: The primary study objective was to examine whether dementia caregivers would use CareHeroes as an adjunct to care and gather psychosocial data from those who used the app. Methods: This paper presents the implementation process used to integrate CareHeroes into clinical care at 2 memory clinics and preliminary outcome evaluation. Family caregivers receiving services at clinics were asked to use the app for a 12-month period to collect, track, and share clinical information with the care recipient?s provider. They also used it to assess their own mental health symptoms. Psychosocial outcomes were assessed through telephone interviews and user data were collected by the app. Results: A total of 21 caregivers enrolled in the pilot study across the 2 memory clinics. Usage data indicated that caregivers used many of the features in the CareHeroes app, though the chatbot was the most frequently used feature. Outcome data indicated that caregivers? depression was lower at 3-month follow-up (t11=2.03, P=.03). Conclusions: Recruitment and retention of the pilot study were impacted by COVID-19 restrictions, and therefore more testing is needed with a larger sample to determine the potential impact of CareHeroes on caregivers? mental health. Despite this limitation, the pilot study demonstrated that integrating a new supportive app for caregivers as an adjunct to clinical dementia care is feasible. Implications for future technology intervention development, implementation planning, and testing for caregivers of people living with dementia are discussed. UR - https://aging.jmir.org/2024/1/e57308 UR - http://dx.doi.org/10.2196/57308 ID - info:doi/10.2196/57308 ER - TY - JOUR AU - Gong, Jeong Eun AU - Bang, Seok Chang AU - Lee, Jun Jae AU - Park, Jonghyung AU - Kim, Eunsil AU - Kim, Subeen AU - Kimm, Minjae AU - Choi, Seoung-Ho PY - 2024/12/20 TI - Large Language Models in Gastroenterology: Systematic Review JO - J Med Internet Res SP - e66648 VL - 26 KW - large language model KW - LLM KW - deep learning KW - artificial intelligence KW - AI KW - endoscopy KW - gastroenterology KW - clinical practice KW - systematic review KW - diagnostic KW - accuracy KW - patient engagement KW - emotional support KW - data privacy KW - diagnosis KW - clinical reasoning N2 - Background: As health care continues to evolve with technological advancements, the integration of artificial intelligence into clinical practices has shown promising potential to enhance patient care and operational efficiency. Among the forefront of these innovations are large language models (LLMs), a subset of artificial intelligence designed to understand, generate, and interact with human language at an unprecedented scale. Objective: This systematic review describes the role of LLMs in improving diagnostic accuracy, automating documentation, and advancing specialist education and patient engagement within the field of gastroenterology and gastrointestinal endoscopy. Methods: Core databases including MEDLINE through PubMed, Embase, and Cochrane Central registry were searched using keywords related to LLMs (from inception to April 2024). Studies were included if they satisfied the following criteria: (1) any type of studies that investigated the potential role of LLMs in the field of gastrointestinal endoscopy or gastroenterology, (2) studies published in English, and (3) studies in full-text format. The exclusion criteria were as follows: (1) studies that did not report the potential role of LLMs in the field of gastrointestinal endoscopy or gastroenterology, (2) case reports and review papers, (3) ineligible research objects (eg, animals or basic research), and (4) insufficient data regarding the potential role of LLMs. Risk of Bias in Non-Randomized Studies?of Interventions was used to evaluate the quality of the identified studies. Results: Overall, 21 studies on the potential role of LLMs in gastrointestinal disorders were included in the systematic review, and narrative synthesis was done because of heterogeneity in the specified aims and methodology in each included study. The overall risk of bias was low in 5 studies and moderate in 16 studies. The ability of LLMs to spread general medical information, offer advice for consultations, generate procedure reports automatically, or draw conclusions about the presumptive diagnosis of complex medical illnesses was demonstrated by the systematic review. Despite promising benefits, such as increased efficiency and improved patient outcomes, challenges related to data privacy, accuracy, and interdisciplinary collaboration remain. Conclusions: We highlight the importance of navigating these challenges to fully leverage LLMs in transforming gastrointestinal endoscopy practices. Trial Registration: PROSPERO 581772; https://www.crd.york.ac.uk/prospero/ UR - https://www.jmir.org/2024/1/e66648 UR - http://dx.doi.org/10.2196/66648 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66648 ER - TY - JOUR AU - Sorin, Vera AU - Brin, Dana AU - Barash, Yiftach AU - Konen, Eli AU - Charney, Alexander AU - Nadkarni, Girish AU - Klang, Eyal PY - 2024/12/11 TI - Large Language Models and Empathy: Systematic Review JO - J Med Internet Res SP - e52597 VL - 26 KW - empathy KW - LLMs KW - AI KW - ChatGPT KW - review methods KW - review methodology KW - systematic review KW - scoping KW - synthesis KW - foundation models KW - text-based KW - human interaction KW - emotional intelligence KW - objective metrics KW - human assessment KW - emotions KW - healthcare KW - cognitive KW - PRISMA N2 - Background: Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience another being?s emotions within oneself. In health care, empathy is a fundamental for health care professionals and patients? interaction. It is a unique quality to humans that large language models (LLMs) are believed to lack. Objective: We aimed to review the literature on the capacity of LLMs in demonstrating empathy. Methods: We conducted a literature search on MEDLINE, Google Scholar, PsyArXiv, medRxiv, and arXiv between December 2022 and February 2024. We included English-language full-length publications that evaluated empathy in LLMs? outputs. We excluded papers evaluating other topics related to emotional intelligence that were not specifically empathy. The included studies? results, including the LLMs used, performance in empathy tasks, and limitations of the models, along with studies? metadata were summarized. Results: A total of 12 studies published in 2023 met the inclusion criteria. ChatGPT-3.5 (OpenAI) was evaluated in all studies, with 6 studies comparing it with other LLMs such GPT-4, LLaMA (Meta), and fine-tuned chatbots. Seven studies focused on empathy within a medical context. The studies reported LLMs to exhibit elements of empathy, including emotions recognition and emotional support in diverse contexts. Evaluation metric included automatic metrics such as Recall-Oriented Understudy for Gisting Evaluation and Bilingual Evaluation Understudy, and human subjective evaluation. Some studies compared performance on empathy with humans, while others compared between different models. In some cases, LLMs were observed to outperform humans in empathy-related tasks. For example, ChatGPT-3.5 was evaluated for its responses to patients? questions from social media, where ChatGPT?s responses were preferred over those of humans in 78.6% of cases. Other studies used subjective readers? assigned scores. One study reported a mean empathy score of 1.84-1.9 (scale 0-2) for their fine-tuned LLM, while a different study evaluating ChatGPT-based chatbots reported a mean human rating of 3.43 out of 4 for empathetic responses. Other evaluations were based on the level of the emotional awareness scale, which was reported to be higher for ChatGPT-3.5 than for humans. Another study evaluated ChatGPT and GPT-4 on soft-skills questions in the United States Medical Licensing Examination, where GPT-4 answered 90% of questions correctly. Limitations were noted, including repetitive use of empathic phrases, difficulty following initial instructions, overly lengthy responses, sensitivity to prompts, and overall subjective evaluation metrics influenced by the evaluator?s background. Conclusions: LLMs exhibit elements of cognitive empathy, recognizing emotions and providing emotionally supportive responses in various contexts. Since social skills are an integral part of intelligence, these advancements bring LLMs closer to human-like interactions and expand their potential use in applications requiring emotional intelligence. However, there remains room for improvement in both the performance of these models and the evaluation strategies used for assessing soft skills. UR - https://www.jmir.org/2024/1/e52597 UR - http://dx.doi.org/10.2196/52597 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52597 ER - TY - JOUR AU - Cho, Seungbeom AU - Lee, Mangyeong AU - Yu, Jaewook AU - Yoon, Junghee AU - Choi, Jae-Boong AU - Jung, Kyu-Hwan AU - Cho, Juhee PY - 2024/12/11 TI - Leveraging Large Language Models for Improved Understanding of Communications With Patients With Cancer in a Call Center Setting: Proof-of-Concept Study JO - J Med Internet Res SP - e63892 VL - 26 KW - large language model KW - cancer KW - supportive care KW - LLMs KW - patient communication KW - natural language processing KW - NLP KW - self-management KW - teleconsultation KW - triage services KW - telephone consultations N2 - Background: Hospital call centers play a critical role in providing support and information to patients with cancer, making it crucial to effectively identify and understand patient intent during consultations. However, operational efficiency and standardization of telephone consultations, particularly when categorizing diverse patient inquiries, remain significant challenges. While traditional deep learning models like long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) have been used to address these issues, they heavily depend on annotated datasets, which are labor-intensive and time-consuming to generate. Large language models (LLMs) like GPT-4, with their in-context learning capabilities, offer a promising alternative for classifying patient intent without requiring extensive retraining. Objective: This study evaluates the performance of GPT-4 in classifying the purpose of telephone consultations of patients with cancer. In addition, it compares the performance of GPT-4 to that of discriminative models, such as LSTM and BERT, with a particular focus on their ability to manage ambiguous and complex queries. Methods: We used a dataset of 430,355 sentences from telephone consultations with patients with cancer between 2016 and 2020. LSTM and BERT models were trained on 300,000 sentences using supervised learning, while GPT-4 was applied using zero-shot and few-shot approaches without explicit retraining. The accuracy of each model was compared using 1,000 randomly selected sentences from 2020 onward, with special attention paid to how each model handled ambiguous or uncertain queries. Results: GPT-4, which uses only a few examples (a few shots), attained a remarkable accuracy of 85.2%, considerably outperforming the LSTM and BERT models, which achieved accuracies of 73.7% and 71.3%, respectively. Notably, categories such as ?Treatment,? ?Rescheduling,? and ?Symptoms? involve multiple contexts and exhibit significant complexity. GPT-4 demonstrated more than 15% superior performance in handling ambiguous queries in these categories. In addition, GPT-4 excelled in categories like ?Records? and ?Routine,? where contextual clues were clear, outperforming the discriminative models. These findings emphasize the potential of LLMs, particularly GPT-4, for interpreting complicated patient interactions during cancer-related telephone consultations. Conclusions: This study shows the potential of GPT-4 to significantly improve the classification of patient intent in cancer-related telephone oncological consultations. GPT-4?s ability to handle complex and ambiguous queries without extensive retraining provides a substantial advantage over discriminative models like LSTM and BERT. While GPT-4 demonstrates strong performance in various areas, further refinement of prompt design and category definitions is necessary to fully leverage its capabilities in practical health care applications. Future research will explore the integration of LLMs like GPT-4 into hybrid systems that combine human oversight with artificial intelligence?driven technologies. UR - https://www.jmir.org/2024/1/e63892 UR - http://dx.doi.org/10.2196/63892 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63892 ER - TY - JOUR AU - Gutiérrez Maquilón, Rodrigo AU - Uhl, Jakob AU - Schrom-Feiertag, Helmut AU - Tscheligi, Manfred PY - 2024/12/11 TI - Integrating GPT-Based AI into Virtual Patients to Facilitate Communication Training Among Medical First Responders: Usability Study of Mixed Reality Simulation JO - JMIR Form Res SP - e58623 VL - 8 KW - medical first responders KW - verbal communication skills KW - training KW - virtual patient KW - generative artificial intelligence KW - GPT KW - large language models KW - prompt engineering KW - mixed reality N2 - Background: Training in social-verbal interactions is crucial for medical first responders (MFRs) to assess a patient?s condition and perform urgent treatment during emergency medical service administration. Integrating conversational agents (CAs) in virtual patients (VPs), that is, digital simulations, is a cost-effective alternative to resource-intensive human role-playing. There is moderate evidence that CAs improve communication skills more effectively when used with instructional interventions. However, more recent GPT-based artificial intelligence (AI) produces richer, more diverse, and more natural responses than previous CAs and has control of prosodic voice qualities like pitch and duration. These functionalities have the potential to better match the interaction expectations of MFRs regarding habitability. Objective: We aimed to study how the integration of GPT-based AI in a mixed reality (MR)?VP could support communication training of MFRs. Methods: We developed an MR simulation of a traffic accident with a VP. ChatGPT (OpenAI) was integrated into the VP and prompted with verified characteristics of accident victims. MFRs (N=24) were instructed on how to interact with the MR scenario. After assessing and treating the VP, the MFRs were administered the Mean Opinion Scale-Expanded, version 2, and the Subjective Assessment of Speech System Interfaces questionnaires to study their perception of the voice quality and the usability of the voice interactions, respectively. Open-ended questions were asked after completing the questionnaires. The observed and logged interactions with the VP, descriptive statistics of the questionnaires, and the output of the open-ended questions are reported. Results: The usability assessment of the VP resulted in moderate positive ratings, especially in habitability (median 4.25, IQR 4-4.81) and likeability (median 4.50, IQR 3.97-5.91). Interactions were negatively affected by the approximately 3-second latency of the responses. MFRs acknowledged the naturalness of determining the physiological states of the VP through verbal communication, for example, with questions such as ?Where does it hurt?? However, the question-answer dynamic in the verbal exchange with the VP and the lack of the VP?s ability to start the verbal exchange were noticed. Noteworthy insights highlighted the potential of domain-knowledge prompt engineering to steer the actions of MFRs for effective training. Conclusions: Generative AI in VPs facilitates MFRs? training but continues to rely on instructions for effective verbal interactions. Therefore, the capabilities of the GPT-VP and a training protocol need to be communicated to trainees. Future interactions should implement triggers based on keyword recognition, the VP pointing to the hurting area, conversational turn-taking techniques, and add the ability for the VP to start a verbal exchange. Furthermore, a local AI server, chunk processing, and lowering the audio resolution of the VP?s voice could ameliorate the delay in response and allay privacy concerns. Prompting could be used in future studies to create a virtual MFR capable of assisting trainees. UR - https://formative.jmir.org/2024/1/e58623 UR - http://dx.doi.org/10.2196/58623 UR - http://www.ncbi.nlm.nih.gov/pubmed/39661979 ID - info:doi/10.2196/58623 ER - TY - JOUR AU - Dzuali, Fiatsogbe AU - Seiger, Kira AU - Novoa, Roberto AU - Aleshin, Maria AU - Teng, Joyce AU - Lester, Jenna AU - Daneshjou, Roxana PY - 2024/12/10 TI - ChatGPT May Improve Access to Language-Concordant Care for Patients With Non?English Language Preferences JO - JMIR Med Educ SP - e51435 VL - 10 KW - ChatGPT KW - artificial intelligence KW - language KW - translation KW - health care disparity KW - natural language model KW - survey KW - patient education KW - preference KW - human language KW - language-concordant care UR - https://mededu.jmir.org/2024/1/e51435 UR - http://dx.doi.org/10.2196/51435 ID - info:doi/10.2196/51435 ER - TY - JOUR AU - Bosco, Cristina AU - Shojaei, Fereshtehossadat AU - Theisz, Andrew Alec AU - Osorio Torres, John AU - Cureton, Bianca AU - Himes, K. Anna AU - Jessup, M. Nenette AU - Barnes, A. Priscilla AU - Lu, Yvonne AU - Hendrie, C. Hugh AU - Hill, V. Carl AU - Shih, C. Patrick PY - 2024/12/9 TI - Testing 3 Modalities (Voice Assistant, Chatbot, and Mobile App) to Assist Older African American and Black Adults in Seeking Information on Alzheimer Disease and Related Dementias: Wizard of Oz Usability Study JO - JMIR Form Res SP - e60650 VL - 8 KW - older African American and Black adults KW - Alzheimer disease and related dementias KW - health literacy KW - Wizard of Oz KW - voice assistant KW - chatbot KW - mobile app KW - dementia KW - geriatric KW - aging KW - Alzheimer disease KW - artificial intelligence KW - AI KW - mHealth KW - digital tools N2 - Background: Older African American and Black adults are twice as likely to develop Alzheimer disease and related dementias (ADRD) and have the lowest level of ADRD health literacy compared to any other ethnic group in the United States. Low health literacy concerning ADRD negatively impacts African American and Black people in accessing adequate health care. Objective: This study explored how 3 technological modalities?voice assistants, chatbots, and mobile apps?can assist older African American and Black adults in accessing ADRD information to improve ADRD health literacy. By testing each modality independently, the focus could be kept on understanding the unique needs and challenges of this population concerning the use of each modality when accessing ADRD-related information. Methods: Using the Wizard of Oz usability testing method, we assessed the 3 modalities with a sample of 15 older African American and Black adults aged >55 years. The 15 participants were asked to interact with the 3 modalities to search for information on local events happening in their geographical area and search for ADRD-related health information. Results: Our findings revealed that, across the 3 modalities, the content should avoid convoluted and complex language and give the possibility to save, store, and share it to be fully accessible by this population. In addition, content should come from credible sources, including information tailored to the participants? cultural values, as it has to be culturally relevant for African American and Black communities. Finally, the interaction with the tool must be time efficient, and it should be adapted to the user?s needs to foster a sense of control and representation. Conclusions: We conclude that, when designing ADRD-related interventions for African American and Black older adults, it proves to be crucial to tailor the content provided by the technology to the community?s values and construct an interaction with the technology that is built on African American and Black communities? needs and demands. UR - https://formative.jmir.org/2024/1/e60650 UR - http://dx.doi.org/10.2196/60650 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60650 ER - TY - JOUR AU - Ali, H. Shahmir AU - Rahman, Fardin AU - Kuwar, Aakanksha AU - Khanna, Twesha AU - Nayak, Anika AU - Sharma, Priyanshi AU - Dasraj, Sarika AU - Auer, Sian AU - Rouf, Rejowana AU - Patel, Tanvi AU - Dhar, Biswadeep PY - 2024/12/9 TI - Rapid, Tailored Dietary and Health Education Through A Social Media Chatbot Microintervention: Development and Usability Study With Practical Recommendations JO - JMIR Form Res SP - e52032 VL - 8 KW - social media KW - chatbot KW - conversational agent KW - intervention KW - diet KW - health education KW - feasibility KW - microintervention KW - innovation KW - dietary education KW - social media chatbot KW - public health professional KW - young adult KW - Asian KW - curriculum N2 - Background: There is an urgent need to innovate methods of health education, which can often be resource- and time-intensive. Microinterventions have shown promise as a platform for rapid, tailored resource dissemination yet have been underexplored as a method of standardized health or dietary education; social media chatbots display unique potential as a modality for accessible, efficient, and affordable educational microinterventions. Objective: This study aims to provide public health professionals with practical recommendations on the use of social media chatbots for health education by (1) documenting the development of a novel social media chatbot intervention aimed at improving dietary attitudes and self-efficacy among South Asian American young adults and (2) describing the applied experiences of implementing the chatbot, along with user experience and engagement data. Methods: In 2023, the ?Roti? chatbot was developed on Facebook and Instagram to administer a 4-lesson tailored dietary health curriculum, informed by formative research and the Theory of Planned Behavior, to 18- to 29-year-old South Asian American participants (recruited through social media from across the United States). Each lesson (10-15 minutes) consisted of 40-50 prescripted interactive texts with the chatbot (including multiple-choice and open-response questions). A preintervention survey determined which lesson(s) were suggested to participants based on their unique needs, followed by a postintervention survey informed by the Theory of Planned Behavior to assess changes in attitudes, self-efficacy, and user experiences (User Experience Questionnaire). This study uses a cross-sectional design to examine postintervention user experiences, engagement, challenges encountered, and solutions developed during the chatbot implementation. Results: Data from 168 participants of the intervention (n=92, 54.8% Facebook; n=76, 45.2% Instagram) were analyzed (mean age 24.5, SD 3.1 years; n=129, 76.8% female). Participants completed an average of 2.6 lessons (13.9 minutes per lesson) and answered an average of 75% of questions asked by the chatbot. Most reported a positive chatbot experience (User Experience Questionnaire: 1.34; 81/116, 69.8% positive), with pragmatic quality (ease of use) being higher than hedonic quality (how interesting it felt; 88/116, 75.9% vs 64/116, 55.2% positive evaluation); younger participants reported greater hedonic quality (P=.04). On a scale out of 10 (highest agreement), participants reported that the chatbot was relevant (8.53), that they learned something new (8.24), and that the chatbot was helpful (8.28). Qualitative data revealed an appreciation for the cheerful, interactive messaging of the chatbot and outlined areas of improvement for the length, timing, and scope of text content. Quick replies, checkpoints, online forums, and self-administered troubleshooting were some solutions developed to meet the challenges experienced. Conclusions: The implementation of a standardized, tailored health education curriculum through an interactive social media chatbot displayed strong feasibility. Lessons learned from challenges encountered and user input provide a tangible roadmap for future exploration of such chatbots for accessible, engaging health interventions. UR - https://formative.jmir.org/2024/1/e52032 UR - http://dx.doi.org/10.2196/52032 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52032 ER - TY - JOUR AU - Sebo, Paul PY - 2024/12/9 TI - Use of ChatGPT to Explore Gender and Geographic Disparities in Scientific Peer Review JO - J Med Internet Res SP - e57667 VL - 26 KW - Africa KW - artificial intelligence KW - discrimination KW - peer review KW - sentiment analysis KW - ChatGPT KW - disparity KW - gender KW - geographic KW - global south KW - inequality KW - woman KW - assessment KW - researcher KW - communication KW - consultation KW - gender bias N2 - Background: In the realm of scientific research, peer review serves as a cornerstone for ensuring the quality and integrity of scholarly papers. Recent trends in promoting transparency and accountability has led some journals to publish peer-review reports alongside papers. Objective: ChatGPT-4 (OpenAI) was used to quantitatively assess sentiment and politeness in peer-review reports from high-impact medical journals. The objective was to explore gender and geographical disparities to enhance inclusivity within the peer-review process. Methods: All 9 general medical journals with an impact factor >2 that publish peer-review reports were identified. A total of 12 research papers per journal were randomly selected, all published in 2023. The names of the first and last authors along with the first author?s country of affiliation were collected, and the gender of both the first and last authors was determined. For each review, ChatGPT-4 was asked to evaluate the ?sentiment score,? ranging from ?100 (negative) to 0 (neutral) to +100 (positive), and the ?politeness score,? ranging from ?100 (rude) to 0 (neutral) to +100 (polite). The measurements were repeated 5 times and the minimum and maximum values were removed. The mean sentiment and politeness scores for each review were computed and then summarized using the median and interquartile range. Statistical analyses included Wilcoxon rank-sum tests, Kruskal-Wallis rank tests, and negative binomial regressions. Results: Analysis of 291 peer-review reports corresponding to 108 papers unveiled notable regional disparities. Papers from the Middle East, Latin America, or Africa exhibited lower sentiment and politeness scores compared to those from North America, Europe, or Pacific and Asia (sentiment scores: 27 vs 60 and 62 respectively; politeness scores: 43.5 vs 67 and 65 respectively, adjusted P=.02). No significant differences based on authors? gender were observed (all P>.05). Conclusions: Notable regional disparities were found, with papers from the Middle East, Latin America, and Africa demonstrating significantly lower scores, while no discernible differences were observed based on authors? gender. The absence of gender-based differences suggests that gender biases may not manifest as prominently as other forms of bias within the context of peer review. The study underscores the need for targeted interventions to address regional disparities in peer review and advocates for ongoing efforts to promote equity and inclusivity in scholarly communication. UR - https://www.jmir.org/2024/1/e57667 UR - http://dx.doi.org/10.2196/57667 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57667 ER - TY - JOUR AU - Foran, M. Heather AU - Kubb, Christian AU - Mueller, Janina AU - Poff, Spencer AU - Ung, Megan AU - Li, Margaret AU - Smith, Michael Eric AU - Akinyemi, Akinniyi AU - Kambadur, Melanie AU - Waller, Franziska AU - Graf, Mario AU - Boureau, Y-Lan PY - 2024/12/6 TI - An Automated Conversational Agent Self-Help Program: Randomized Controlled Trial JO - J Med Internet Res SP - e53829 VL - 26 KW - well-being KW - chatbot KW - randomized controlled trial KW - prevention KW - flourishing N2 - Background: Health promotion and growth-based interventions can effectively improve individual well-being; however, significant gaps in access and utilization still exist. Objective: This study aims to develop and test the effectiveness and implementation of a new, widely targeted conversational agent prevention program (Zenny) designed to enhance well-being. Methods: A total of 1345 individuals in the United States were recruited online and randomly assigned to either (1) a self-help program intervention delivered via an automated conversational agent on WhatsApp or (2) an active control group that had access to evidence-based wellness resources available online. The primary outcomes were well-being (measured using the 5-item World Health Organization Well-being Scale), psychosocial flourishing (assessed with the Flourishing Scale), and positive psychological health (evaluated with the Mental Health Continuum-Short Form). Outcome measures were collected at baseline and again 1 month postassessment. All analyses were conducted using an intention-to-treat approach. Results: Both groups showed significant improvements in well-being (self-help program intervention group effect size: Cohen d=0.26, P<.001; active control group effect size: d=0.24, P<.001), psychosocial flourishing (intervention: d=0.19, P<.001; active control: d=0.18, P<.001), and positive psychological health (intervention: d=0.17, P=.001; active control: d=0.24, P<.001) at postassessment. However, there were no significant differences in effectiveness between the 2 groups (P ranged from .56 to .92). As hypothesized a priori, a greater number of days spent actively engaging with the conversational agent was associated with larger improvements in well-being at postassessment among participants in the intervention group (?=.109, P=.04). Conclusions: The findings from this study suggest that the free conversational agent wellness self-help program was as effective as evidence-based web resources. Further research should explore strategies to increase participant engagement over time, as only a portion of participants were actively involved, and higher engagement was linked to greater improvements in well-being. Long-term follow-up studies are also necessary to assess whether these effects remain stable over time. Trial Registration: ClinicalTrials.gov NCT06208566; https://clinicaltrials.gov/ct2/show/NCT06208566; OSF Registries osf.io/ahe2r; https://doi.org/10.17605/osf.io/ahe2r UR - https://www.jmir.org/2024/1/e53829 UR - http://dx.doi.org/10.2196/53829 UR - http://www.ncbi.nlm.nih.gov/pubmed/39641985 ID - info:doi/10.2196/53829 ER - TY - JOUR AU - Wu, Yibo AU - Zhang, Jinzi AU - Ge, Pu AU - Duan, Tingyu AU - Zhou, Junyu AU - Wu, Yiwei AU - Zhang, Yuening AU - Liu, Siyu AU - Liu, Xinyi AU - Wan, Erya AU - Sun, Xinying PY - 2024/12/3 TI - Application of Chatbots to Help Patients Self-Manage Diabetes: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e60380 VL - 26 KW - artificial intelligence KW - chatbot KW - diabetes KW - health education KW - self-management KW - systematic review N2 - Background: The number of people with diabetes is on the rise globally. Self-management and health education of patients are the keys to control diabetes. With the development of digital therapies and artificial intelligence, chatbots have the potential to provide health-related information and improve accessibility and effectiveness in the field of patient self-management. Objective: This study systematically reviews the current research status and effectiveness of chatbots in the field of diabetes self-management to support the development of diabetes chatbots. Methods: A systematic review and meta-analysis of chatbots that can help patients with diabetes with self-management was conducted. PubMed and Web of Science databases were searched using keywords around diabetes, chatbots, conversational agents, virtual assistants, and more. The search period was from the date of creation of the databases to January 1, 2023. Research articles in English that fit the study topic were selected, and articles that did not fit the study topic or were not available in full text were excluded. Results: In total, 25 studies were included in the review. In terms of study type, all articles could be classified as systematic design studies (n=8, 32%), pilot studies (n=8, 32%), and intervention studies (n=9, 36%). Many articles adopted a nonrandomized controlled trial design in intervention studies (n=6, 24%), and there was only 1 (4%) randomized controlled trial. In terms of research strategy, all articles can be divided into quantitative studies (n=10, 40%), mixed studies (n=6, 24%), and qualitative studies (n=1, 4%). The evaluation criteria for chatbot effectiveness can be divided into technical performance evaluation, user experience evaluation, and user health evaluation. Most chatbots (n=17, 68%) provided education and management focused on patient diet, exercise, glucose monitoring, medications, and complications, and only a few studies (n=2, 8%) provided education on mental health. The meta-analysis found that the chatbot intervention was effective in lowering blood glucose (mean difference 0.30, 95% CI 0.04-0.55; P=.02) and had no significant effect in reducing weight (mean difference 1.41, 95% CI ?2.29 to 5.11; P=.46) compared with the baseline. Conclusions: Chatbots have potential for the development of self-management for people with diabetes. However, the evidence level of current research is low, and higher level research (such as randomized controlled trials) is needed to strengthen the evidence base. More use of mixed research in the research strategy is needed to fully use the strengths of both quantitative and qualitative research. Appropriate and innovative theoretical frameworks should be used in the research to provide theoretical support for the study. In addition, researchers should focus on the personalized and user-friendly interactive features of chatbots, as well as improvements in study design. UR - https://www.jmir.org/2024/1/e60380 UR - http://dx.doi.org/10.2196/60380 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60380 ER - TY - JOUR AU - Luo, Yuan AU - Miao, Yiqun AU - Zhao, Yuhan AU - Li, Jiawei AU - Chen, Yuling AU - Yue, Yuexue AU - Wu, Ying PY - 2024/12/2 TI - Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study JO - JMIR Form Res SP - e63188 VL - 8 KW - rumor KW - misconception KW - health science popularization KW - health education KW - large language model KW - LLM KW - applicability KW - accuracy KW - effectiveness KW - health related KW - education KW - health science KW - proof of concept N2 - Background: Health-related rumors and misconceptions are spreading at an alarming rate, fueled by the rapid development of the internet and the exponential growth of social media platforms. This phenomenon has become a pressing global concern, as the dissemination of false information can have severe consequences, including widespread panic, social instability, and even public health crises. Objective: The aim of the study is to compare the accuracy of rumor identification and the effectiveness of health science popularization between 2 generated large language models in Chinese (GPT-4 by OpenAI and Enhanced Representation through Knowledge Integration Bot [ERNIE Bot] 4.0 by Baidu). Methods: In total, 20 health rumors and misconceptions, along with 10 health truths, were randomly inputted into GPT-4 and ERNIE Bot 4.0. We prompted them to determine whether the statements were rumors or misconceptions and provide explanations for their judgment. Further, we asked them to generate a health science popularization essay. We evaluated the outcomes in terms of accuracy, effectiveness, readability, and applicability. Accuracy was assessed by the rate of correctly identifying health-related rumors, misconceptions, and truths. Effectiveness was determined by the accuracy of the generated explanation, which was assessed collaboratively by 2 research team members with a PhD in nursing. Readability was calculated by the readability formula of Chinese health education materials. Applicability was evaluated by the Chinese Suitability Assessment of Materials. Results: GPT-4 and ERNIE Bot 4.0 correctly identified all health rumors and misconceptions (100% accuracy rate). For truths, the accuracy rate was 70% (7/10) and 100% (10/10), respectively. Both mostly provided widely recognized viewpoints without obvious errors. The average readability score for the health essays was 2.92 (SD 0.85) for GPT-4 and 3.02 (SD 0.84) for ERNIE Bot 4.0 (P=.65). For applicability, except for the content and cultural appropriateness category, significant differences were observed in the total score and scores in other dimensions between them (P<.05). Conclusions: ERNIE Bot 4.0 demonstrated similar accuracy to GPT-4 in identifying Chinese rumors. Both provided widely accepted views, despite some inaccuracies. These insights enhance understanding and correct misunderstandings. For health essays, educators can learn from readable language styles of GLLMs. Finally, ERNIE Bot 4.0 aligns with Chinese expression habits, making it a good choice for a better Chinese reading experience. UR - https://formative.jmir.org/2024/1/e63188 UR - http://dx.doi.org/10.2196/63188 ID - info:doi/10.2196/63188 ER - TY - JOUR AU - Sato, Ann AU - Haneda, Eri AU - Hiroshima, Yukihiko AU - Narimatsu, Hiroto PY - 2024/11/27 TI - Preliminary Screening for Hereditary Breast and Ovarian Cancer Using an AI Chatbot as a Genetic Counselor: Clinical Study JO - J Med Internet Res SP - e48914 VL - 26 KW - hereditary cancer KW - familial cancer KW - IBM Watson KW - family history KW - medical history KW - cancer KW - feasibility KW - social network KW - screening KW - breast cancer KW - ovarian cancer KW - artificial intelligence KW - AI KW - chatbot KW - genetic KW - counselling KW - oncology KW - conversational agent KW - implementation KW - usability KW - acceptability N2 - Background: Hereditary breast and ovarian cancer (HBOC) is a major type of hereditary cancer. Establishing effective screening to identify high-risk individuals for HBOC remains a challenge. We developed a prototype of a chatbot system that uses artificial intelligence (AI) for preliminary HBOC screening to determine whether individuals meet the National Comprehensive Cancer Network BRCA1/2 testing criteria. Objective: This study?s objective was to validate the feasibility of this chatbot in a clinical setting by using it on a patient population that visited a hospital. Methods: We validated the medical accuracy of the chatbot system by performing a test on patients who consecutively visited the Kanagawa Cancer Center. The participants completed a preoperation questionnaire to understand their background, including information technology literacy. After the operation, qualitative interviews were conducted to collect data on the usability and acceptability of the system and examine points needing improvement. Results: A total of 11 participants were enrolled between October and December 2020. All of the participants were women, and among them, 10 (91%) had cancer. According to the questionnaire, 6 (54%) participants had never heard of a chatbot, while 7 (64%) had never used one. All participants were able to complete the chatbot operation, and the average time required for the operation was 18.0 (SD 5.44) minutes. The determinations by the chatbot of whether the participants met the BRCA1/2 testing criteria based on their medical and family history were consistent with those by certified genetic counselors (CGCs). We compared the medical histories obtained from the participants by the CGCs with those by the chatbot. Of the 11 participants, 3 (27%) entered information different from that obtained by the CGCs. These discrepancies were caused by the participant?s omissions or communication errors with the chatbot. Regarding the family histories, the chatbot provided new information for 3 (27%) of the 11 participants and complemented information for the family members of 5 (45%) participants not interviewed by the CGCs. The chatbot could not obtain some information on the family history of 6 (54%) participants due to several reasons, such as being outside of the scope of the chatbot?s interview questions, the participant?s omissions, and communication errors with the chatbot. Interview data were classified into the following: (1) features, (2) appearance, (3) usability and preferences, (4) concerns, (5) benefits, and (6) implementation. Favorable comments on implementation feasibility and comments on improvements were also obtained. Conclusions: This study demonstrated that the preliminary screening system for HBOC using an AI chatbot was feasible for real patients. UR - https://www.jmir.org/2024/1/e48914 UR - http://dx.doi.org/10.2196/48914 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/48914 ER - TY - JOUR AU - Ke, Yuhe AU - Yang, Rui AU - Lie, An Sui AU - Lim, Yi Taylor Xin AU - Ning, Yilin AU - Li, Irene AU - Abdullah, Rizal Hairil AU - Ting, Wei Daniel Shu AU - Liu, Nan PY - 2024/11/19 TI - Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study JO - J Med Internet Res SP - e59439 VL - 26 KW - clinical decision-making KW - cognitive bias KW - generative artificial intelligence KW - large language model KW - multi-agent N2 - Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil?s advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test. Results: A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002). Conclusions: The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. UR - https://www.jmir.org/2024/1/e59439 UR - http://dx.doi.org/10.2196/59439 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59439 ER - TY - JOUR AU - Ros-Arlanzón, Pablo AU - Perez-Sempere, Angel PY - 2024/11/14 TI - Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain JO - JMIR Med Educ SP - e56762 VL - 10 KW - artificial intelligence KW - ChatGPT KW - clinical decision-making KW - medical education KW - medical knowledge assessment KW - OpenAI N2 - Background: With the rapid advancement of artificial intelligence (AI) in various fields, evaluating its application in specialized medical contexts becomes crucial. ChatGPT, a large language model developed by OpenAI, has shown potential in diverse applications, including medicine. Objective: This study aims to compare the performance of ChatGPT with that of attending neurologists in a real neurology specialist examination conducted in the Valencian Community, Spain, assessing the AI?s capabilities and limitations in medical knowledge. Methods: We conducted a comparative analysis using the 2022 neurology specialist examination results from 120 neurologists and responses generated by ChatGPT versions 3.5 and 4. The examination consisted of 80 multiple-choice questions, with a focus on clinical neurology and health legislation. Questions were classified according to Bloom?s Taxonomy. Statistical analysis of performance, including the ? coefficient for response consistency, was performed. Results: Human participants exhibited a median score of 5.91 (IQR: 4.93-6.76), with 32 neurologists failing to pass. ChatGPT-3.5 ranked 116th out of 122, answering 54.5% of questions correctly (score 3.94). ChatGPT-4 showed marked improvement, ranking 17th with 81.8% of correct answers (score 7.57), surpassing several human specialists. No significant variations were observed in the performance on lower-order questions versus higher-order questions. Additionally, ChatGPT-4 demonstrated increased interrater reliability, as reflected by a higher ? coefficient of 0.73, compared to ChatGPT-3.5?s coefficient of 0.69. Conclusions: This study underscores the evolving capabilities of AI in medical knowledge assessment, particularly in specialized fields. ChatGPT-4?s performance, outperforming the median score of human participants in a rigorous neurology examination, represents a significant milestone in AI development, suggesting its potential as an effective tool in specialized medical education and assessment. UR - https://mededu.jmir.org/2024/1/e56762 UR - http://dx.doi.org/10.2196/56762 ID - info:doi/10.2196/56762 ER - TY - JOUR AU - Nagarajan, Radha AU - Kondo, Midori AU - Salas, Franz AU - Sezgin, Emre AU - Yao, Yuan AU - Klotzman, Vanessa AU - Godambe, A. Sandip AU - Khan, Naqi AU - Limon, Alfonso AU - Stephenson, Graham AU - Taraman, Sharief AU - Walton, Nephi AU - Ehwerhemuepha, Louis AU - Pandit, Jay AU - Pandita, Deepti AU - Weiss, Michael AU - Golden, Charles AU - Gold, Adam AU - Henderson, John AU - Shippy, Angela AU - Celi, Anthony Leo AU - Hogan, R. William AU - Oermann, K. Eric AU - Sanger, Terence AU - Martel, Steven PY - 2024/11/14 TI - Economics and Equity of Large Language Models: Health Care Perspective JO - J Med Internet Res SP - e64226 VL - 26 KW - large language model KW - LLM KW - health care KW - economics KW - equity KW - cloud service providers KW - cloud KW - health outcome KW - implementation KW - democratization UR - https://www.jmir.org/2024/1/e64226 UR - http://dx.doi.org/10.2196/64226 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/64226 ER - TY - JOUR AU - Ming, Shuai AU - Yao, Xi AU - Guo, Xiaohong AU - Guo, Qingge AU - Xie, Kunpeng AU - Chen, Dandan AU - Lei, Bo PY - 2024/11/14 TI - Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study JO - J Med Internet Res SP - e60226 VL - 26 KW - artificial intelligence KW - chatbot KW - ChatGPT KW - ophthalmic registration KW - clinical diagnosis KW - AI KW - cross-sectional study KW - eye disease KW - eye disorder KW - ophthalmology KW - health care KW - outpatient registration KW - clinical KW - decision-making KW - generative AI KW - vision impairment N2 - Background: Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic subspecialties have yet to be fully explored. Objective: This study aims to investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and diagnosing eye diseases within clinical case profiles. Methods: This cross-sectional study used clinical cases from Chinese Standardized Resident Training?Ophthalmology (2nd Edition). For each case, 2 profiles were created: patient with history (Hx) and patient with history and examination (Hx+Ex). These profiles served as independent queries for GPT-3.5 and GPT-4.0 (accessed from March 5 to 18, 2024). Similarly, 3 ophthalmic residents were posed the same profiles in a questionnaire format. The accuracy of recommending ophthalmic subspecialty registration was primarily evaluated using Hx profiles. The accuracy of the top-ranked diagnosis and the accuracy of the diagnosis within the top 3 suggestions (do-not-miss diagnosis) were assessed using Hx+Ex profiles. The gold standard for judgment was the published, official diagnosis. Characteristics of incorrect diagnoses by ChatGPT were also analyzed. Results: A total of 208 clinical profiles from 12 ophthalmic subspecialties were analyzed (104 Hx and 104 Hx+Ex profiles). For Hx profiles, GPT-3.5, GPT-4.0, and residents showed comparable accuracy in registration suggestions (66/104, 63.5%; 81/104, 77.9%; and 72/104, 69.2%, respectively; P=.07), with ocular trauma, retinal diseases, and strabismus and amblyopia achieving the top 3 accuracies. For Hx+Ex profiles, both GPT-4.0 and residents demonstrated higher diagnostic accuracy than GPT-3.5 (62/104, 59.6% and 63/104, 60.6% vs 41/104, 39.4%; P=.003 and P=.001, respectively). Accuracy for do-not-miss diagnoses also improved (79/104, 76% and 68/104, 65.4% vs 51/104, 49%; P<.001 and P=.02, respectively). The highest diagnostic accuracies were observed in glaucoma; lens diseases; and eyelid, lacrimal, and orbital diseases. GPT-4.0 recorded fewer incorrect top-3 diagnoses (25/42, 60% vs 53/63, 84%; P=.005) and more partially correct diagnoses (21/42, 50% vs 7/63 11%; P<.001) than GPT-3.5, while GPT-3.5 had more completely incorrect (27/63, 43% vs 7/42, 17%; P=.005) and less precise diagnoses (22/63, 35% vs 5/42, 12%; P=.009). Conclusions: GPT-3.5 and GPT-4.0 showed intermediate performance in recommending ophthalmic subspecialties for registration. While GPT-3.5 underperformed, GPT-4.0 approached and numerically surpassed residents in differential diagnosis. AI chatbots show promise in facilitating ophthalmic patient registration. However, their integration into diagnostic decision-making requires more validation. UR - https://www.jmir.org/2024/1/e60226 UR - http://dx.doi.org/10.2196/60226 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60226 ER - TY - JOUR AU - Rivera Rivera, Nathalie Jessica AU - AuBuchon, E. Katarina AU - Smith, Marjanna AU - Starling, Claire AU - Ganacias, G. Karen AU - Danielson, Aimee AU - Patchen, Loral AU - Rethy, A. Janine AU - Blumenthal, Joseph H. AU - Thomas, D. Angela AU - Arem, Hannah PY - 2024/11/14 TI - Development and Refinement of a Chatbot for Birthing Individuals and Newborn Caregivers: Mixed Methods Study JO - JMIR Pediatr Parent SP - e56807 VL - 7 KW - postpartum care KW - newborn care KW - health education KW - chatbot KW - mHealth KW - mobile health KW - feedback KW - health equity N2 - Background: The 42 days after delivery (?fourth trimester?) are a high-risk period for birthing individuals and newborns, especially those who are racially and ethnically marginalized due to structural racism. Objective: To fill a gap in the critical ?fourth trimester,? we developed 2 ruled-based chatbots?one for birthing individuals and one for newborn caregivers?that provided trusted information about postbirth warning signs and newborn care and connected patients with health care providers. Methods: A total of 4370 individuals received the newborn chatbot outreach between September 1, 2022, and December 31, 2023, and 3497 individuals received the postpartum chatbot outreach between November 16, 2022, and December 31, 2023. We conducted surveys and interviews in English and Spanish to understand the acceptability and usability of the chatbot and identify areas for improvement. We sampled from hospital discharge lists that distributed the chatbot, stratified by prenatal care location, age, type of insurance, and racial and ethnic group. We analyzed quantitative results using descriptive analyses in SPSS (IBM Corp) and qualitative results using deductive coding in Dedoose (SocioCultural Research Consultants). Results: Overall, 2748 (63%) individuals opened the newborn chatbot messaging, and 2244 (64%) individuals opened the postpartum chatbot messaging. A total of 100 patients engaged with the chatbot and provided survey feedback; of those, 40% (n=40) identified as Black, 27% (n=27) identified as Hispanic/Latina, and 18% (n=18) completed the survey in Spanish. Payer distribution was 55% (n=55) for individuals with public insurance, 39% (n=39) for those with commercial insurance, and 2% (n=2) for uninsured individuals. The majority of surveyed participants indicated that chatbot messaging was timely and easy to use (n=80, 80%) and found the reminders to schedule the newborn visit (n=59, 59%) and postpartum visit (n=66, 66%) useful. Across 23 interviews (n=14, 61% Black; n=4, 17% Hispanic/Latina; n=2, 9% in Spanish; n=11, 48% public insurance), 78% (n=18) of interviewees engaged with the chatbot. Interviewees provided positive feedback on usability and content and recommendations for improving the outreach messages. Conclusions: Chatbots are a promising strategy to reach birthing individuals and newborn caregivers with information about postpartum recovery and newborn care, but intentional outreach and engagement strategies are needed to optimize interaction. Future work should measure the chatbot?s impact on health outcomes and reduce disparities. UR - https://pediatrics.jmir.org/2024/1/e56807 UR - http://dx.doi.org/10.2196/56807 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56807 ER - TY - JOUR AU - Yang, Yanrong AU - Tavares, Jorge AU - Oliveira, Tiago PY - 2024/11/11 TI - A New Research Model for Artificial Intelligence?Based Well-Being Chatbot Engagement: Survey Study JO - JMIR Hum Factors SP - e59908 VL - 11 KW - artificial intelligence?based chatbot KW - AI-based chatbot KW - mental well-being KW - intention to engage KW - engagement behavior KW - theoretical models KW - mobile phone N2 - Background: Artificial intelligence (AI)?based chatbots have emerged as potential tools to assist individuals in reducing anxiety and supporting well-being. Objective: This study aimed to identify the factors that impact individuals? intention to engage and their engagement behavior with AI-based well-being chatbots by using a novel research model to enhance service levels, thereby improving user experience and mental health intervention effectiveness. Methods: We conducted a web-based questionnaire survey of adult users of well-being chatbots in China via social media. Our survey collected demographic data, as well as a range of measures to assess relevant theoretical factors. Finally, 256 valid responses were obtained. The newly applied model was validated through the partial least squares structural equation modeling approach. Results: The model explained 62.8% (R2) of the variance in intention to engage and 74% (R2) of the variance in engagement behavior. Affect (?=.201; P=.002), social factors (?=.184; P=.007), and compatibility (?=.149; P=.03) were statistically significant for the intention to engage. Habit (?=.154; P=.01), trust (?=.253; P<.001), and intention to engage (?=.464; P<.001) were statistically significant for engagement behavior. Conclusions: The new extended model provides a theoretical basis for studying users? AI-based chatbot engagement behavior. This study highlights practical points for developers of AI-based well-being chatbots. It also highlights the importance of AI-based well-being chatbots to create an emotional connection with the users. UR - https://humanfactors.jmir.org/2024/1/e59908 UR - http://dx.doi.org/10.2196/59908 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59908 ER - TY - JOUR AU - Wang, Leyao AU - Wan, Zhiyu AU - Ni, Congning AU - Song, Qingyuan AU - Li, Yang AU - Clayton, Ellen AU - Malin, Bradley AU - Yin, Zhijun PY - 2024/11/7 TI - Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review JO - J Med Internet Res SP - e22769 VL - 26 KW - large language model KW - ChatGPT KW - artificial intelligence KW - natural language processing KW - health care KW - summarization KW - medical knowledge inquiry KW - reliability KW - bias KW - privacy N2 - Background: The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. Objective: This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. Methods: We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care. UR - https://www.jmir.org/2024/1/e22769 UR - http://dx.doi.org/10.2196/22769 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/22769 ER - TY - JOUR AU - Fajnerova, Iveta AU - Hejtmánek, Luká? AU - Sedlák, Michal AU - Jablonská, Markéta AU - Francová, Anna AU - Stopková, Pavla PY - 2024/11/7 TI - The Journey From Nonimmersive to Immersive Multiuser Applications in Mental Health Care: Systematic Review JO - J Med Internet Res SP - e60441 VL - 26 KW - digital health KW - mental health care KW - clinical interventions KW - multiuser KW - immersive KW - virtual reality KW - VR KW - app KW - mental health KW - online tools KW - synthesis KW - mobile phone KW - PRISMA N2 - Background: Over the past 25 years, the development of multiuser applications has seen considerable advancements and challenges. The technological development in this field has emerged from simple chat rooms through videoconferencing tools to the creation of complex, interactive, and often multisensory virtual worlds. These multiuser technologies have gradually found their way into mental health care, where they are used in both dyadic counseling and group interventions. However, some limitations in hardware capabilities, user experience designs, and scalability may have hindered the effectiveness of these applications. Objective: This systematic review aims at summarizing the progress made and the potential future directions in this field while evaluating various factors and perspectives relevant to remote multiuser interventions. Methods: The systematic review was performed based on a Web of Science and PubMed database search covering articles in English, published from January 1999 to March 2024, related to multiuser mental health interventions. Several inclusion and exclusion criteria were determined before and during the records screening process, which was performed in several steps. Results: We identified 49 records exploring multiuser applications in mental health care, ranging from text-based interventions to interventions set in fully immersive environments. The number of publications exploring this topic has been growing since 2015, with a large increase during the COVID-19 pandemic. Most digital interventions were delivered in the form of videoconferencing, with only a few implementing immersive environments. The studies used professional or peer-supported group interventions or a combination of both approaches. The research studies targeted diverse groups and topics, from nursing mothers to psychiatric disorders or various minority groups. Most group sessions occurred weekly, or in the case of the peer-support groups, often with a flexible schedule. Conclusions: We identified many benefits to multiuser digital interventions for mental health care. These approaches provide distributed, always available, and affordable peer support that can be used to deliver necessary help to people living outside of areas where in-person interventions are easily available. While immersive virtual environments have become a common tool in many areas of psychiatric care, such as exposure therapy, our results suggest that this technology in multiuser settings is still in its early stages. Most identified studies investigated mainstream technologies, such as videoconferencing or text-based support, substituting the immersive experience for convenience and ease of use. While many studies discuss useful features of virtual environments in group interventions, such as anonymity or stronger engagement with the group, we discuss persisting issues with these technologies, which currently prevent their full adoption. UR - https://www.jmir.org/2024/1/e60441 UR - http://dx.doi.org/10.2196/60441 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60441 ER - TY - JOUR AU - Chow, L. James C. AU - Li, Kay PY - 2024/11/6 TI - Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models JO - JMIR Bioinform Biotech SP - e64406 VL - 5 KW - artificial intelligence KW - humanistic AI KW - ethical AI KW - human-centered AI KW - machine learning KW - large language models KW - natural language processing KW - oncology chatbot KW - transformer-based model KW - ChatGPT KW - health care UR - https://bioinform.jmir.org/2024/1/e64406 UR - http://dx.doi.org/10.2196/64406 UR - http://www.ncbi.nlm.nih.gov/pubmed/39321336 ID - info:doi/10.2196/64406 ER - TY - JOUR AU - Waldock, J. William AU - Zhang, Joe AU - Guni, Ahmad AU - Nabeel, Ahmad AU - Darzi, Ara AU - Ashrafian, Hutan PY - 2024/11/5 TI - The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e56532 VL - 26 KW - large language model KW - LLM KW - artificial intelligence KW - AI KW - health care exam KW - narrative medical response KW - health care examination KW - clinical commissioning KW - health services KW - safety N2 - Background: Large language models (LLMs) have dominated public interest due to their apparent capability to accurately replicate learned knowledge in narrative text. However, there is a lack of clarity about the accuracy and capability standards of LLMs in health care examinations. Objective: We conducted a systematic review of LLM accuracy, as tested under health care examination conditions, as compared to known human performance standards. Methods: We quantified the accuracy of LLMs in responding to health care examination questions and evaluated the consistency and quality of study reporting. The search included all papers up until September 10, 2023, with all LLMs published in English journals that report clear LLM accuracy standards. The exclusion criteria were as follows: the assessment was not a health care exam, there was no LLM, there was no evaluation of comparable success accuracy, and the literature was not original research.The literature search included the following Medical Subject Headings (MeSH) terms used in all possible combinations: ?artificial intelligence,? ?ChatGPT,? ?GPT,? ?LLM,? ?large language model,? ?machine learning,? ?neural network,? ?Generative Pre-trained Transformer,? ?Generative Transformer,? ?Generative Language Model,? ?Generative Model,? ?medical exam,? ?healthcare exam,? and ?clinical exam.? Sensitivity, accuracy, and precision data were extracted, including relevant CIs. Results: The search identified 1673 relevant citations. After removing duplicate results, 1268 (75.8%) papers were screened for titles and abstracts, and 32 (2.5%) studies were included for full-text review. Our meta-analysis suggested that LLMs are able to perform with an overall medical examination accuracy of 0.61 (CI 0.58-0.64) and a United States Medical Licensing Examination (USMLE) accuracy of 0.51 (CI 0.46-0.56), while Chat Generative Pretrained Transformer (ChatGPT) can perform with an overall medical examination accuracy of 0.64 (CI 0.6-0.67). Conclusions: LLMs offer promise to remediate health care demand and staffing challenges by providing accurate and efficient context-specific information to critical decision makers. For policy and deployment decisions about LLMs to advance health care, we proposed a new framework called RUBRICC (Regulatory, Usability, Bias, Reliability [Evidence and Safety], Interoperability, Cost, and Codesign?Patient and Public Involvement and Engagement [PPIE]). This presents a valuable opportunity to direct the clinical commissioning of new LLM capabilities into health services, while respecting patient safety considerations. Trial Registration: OSF Registries osf.io/xqzkw; https://osf.io/xqzkw UR - https://www.jmir.org/2024/1/e56532 UR - http://dx.doi.org/10.2196/56532 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56532 ER - TY - JOUR AU - Alli, Rabia Sauliha AU - Hossain, Qahh?r Soaad AU - Das, Sunit AU - Upshur, Ross PY - 2024/11/4 TI - The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education JO - JMIR Med Educ SP - e51446 VL - 10 KW - artificial intelligence KW - machine learning KW - uncertainty KW - clinical decision-making KW - medical education KW - generative AI KW - generative artificial intelligence UR - https://mededu.jmir.org/2024/1/e51446 UR - http://dx.doi.org/10.2196/51446 ID - info:doi/10.2196/51446 ER - TY - JOUR AU - Yau, Yi-Shin Jonathan AU - Saadat, Soheil AU - Hsu, Edmund AU - Murphy, Suk-Ling Linda AU - Roh, S. Jennifer AU - Suchard, Jeffrey AU - Tapia, Antonio AU - Wiechmann, Warren AU - Langdorf, I. Mark PY - 2024/11/4 TI - Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study JO - J Med Internet Res SP - e60291 VL - 26 KW - artificial intelligence KW - AI KW - chatbots KW - generative AI KW - natural language processing KW - consumer health information KW - patient education KW - literacy KW - emergency care information KW - chatbot KW - misinformation KW - health care KW - medical consultation N2 - Background: Recent surveys indicate that 48% of consumers actively use generative artificial intelligence (AI) for health-related inquiries. Despite widespread adoption and the potential to improve health care access, scant research examines the performance of AI chatbot responses regarding emergency care advice. Objective: We assessed the quality of AI chatbot responses to common emergency care questions. We sought to determine qualitative differences in responses from 4 free-access AI chatbots, for 10 different serious and benign emergency conditions. Methods: We created 10 emergency care questions that we fed into the free-access versions of ChatGPT 3.5 (OpenAI), Google Bard, Bing AI Chat (Microsoft), and Claude AI (Anthropic) on November 26, 2023. Each response was graded by 5 board-certified emergency medicine (EM) faculty for 8 domains of percentage accuracy, presence of dangerous information, factual accuracy, clarity, completeness, understandability, source reliability, and source relevancy. We determined the correct, complete response to the 10 questions from reputable and scholarly emergency medical references. These were compiled by an EM resident physician. For the readability of the chatbot responses, we used the Flesch-Kincaid Grade Level of each response from readability statistics embedded in Microsoft Word. Differences between chatbots were determined by the chi-square test. Results: Each of the 4 chatbots? responses to the 10 clinical questions were scored across 8 domains by 5 EM faculty, for 400 assessments for each chatbot. Together, the 4 chatbots had the best performance in clarity and understandability (both 85%), intermediate performance in accuracy and completeness (both 50%), and poor performance (10%) for source relevance and reliability (mostly unreported). Chatbots contained dangerous information in 5% to 35% of responses, with no statistical difference between chatbots on this metric (P=.24). ChatGPT, Google Bard, and Claud AI had similar performances across 6 out of 8 domains. Only Bing AI performed better with more identified or relevant sources (40%; the others had 0%-10%). Flesch-Kincaid Reading level was 7.7-8.9 grade for all chatbots, except ChatGPT at 10.8, which were all too advanced for average emergency patients. Responses included both dangerous (eg, starting cardiopulmonary resuscitation with no pulse check) and generally inappropriate advice (eg, loosening the collar to improve breathing without evidence of airway compromise). Conclusions: AI chatbots, though ubiquitous, have significant deficiencies in EM patient advice, despite relatively consistent performance. Information for when to seek urgent or emergent care is frequently incomplete and inaccurate, and patients may be unaware of misinformation. Sources are not generally provided. Patients who use AI to guide health care decisions assume potential risks. AI chatbots for health should be subject to further research, refinement, and regulation. We strongly recommend proper medical consultation to prevent potential adverse outcomes. UR - https://www.jmir.org/2024/1/e60291 UR - http://dx.doi.org/10.2196/60291 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60291 ER - TY - JOUR AU - Coppersmith, DL Daniel AU - Bentley, H. Kate AU - Kleiman, M. Evan AU - Jaroszewski, C. Adam AU - Daniel, Merryn AU - Nock, K. Matthew PY - 2024/10/31 TI - Automated Real-Time Tool for Promoting Crisis Resource Use for Suicide Risk (ResourceBot): Development and Usability Study JO - JMIR Ment Health SP - e58409 VL - 11 KW - suicidal thoughts KW - suicidal behaviors KW - ecological momentary assessment KW - crisis resources KW - real-time tool KW - self-report KW - psychoeducation KW - app N2 - Background: Real-time monitoring captures information about suicidal thoughts and behaviors (STBs) as they occur and offers great promise to learn about STBs. However, this approach also introduces questions about how to monitor and respond to real-time information about STBs. Given the increasing use of real-time monitoring, there is a need for novel, effective, and scalable tools for responding to suicide risk in real time. Objective: The goal of this study was to develop and test an automated tool (ResourceBot) that promotes the use of crisis services (eg, 988) in real time through a rule-based (ie, if-then) brief barrier reduction intervention. Methods: ResourceBot was tested in a 2-week real-time monitoring study of 74 adults with recent suicidal thoughts. Results: ResourceBot was deployed 221 times to 36 participants. There was high engagement with ResourceBot (ie, 87% of the time ResourceBot was deployed, a participant opened the tool and submitted a response to it), but zero participants reported using crisis services after engaging with ResourceBot. The most reported reasons for not using crisis services were beliefs that the resources would not help, wanting to handle things on one?s own, and the resources requiring too much time or effort. At the end of the study, participants rated ResourceBot with good usability (mean of 75.6 out of 100) and satisfaction (mean of 20.8 out of 32). Conclusions: This study highlights both the possibilities and challenges of developing effective real-time interventions for suicide risk and areas for refinement in future work. UR - https://mental.jmir.org/2024/1/e58409 UR - http://dx.doi.org/10.2196/58409 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58409 ER - TY - JOUR AU - Tao, Wenjuan AU - Yang, Jinming AU - Qu, Xing PY - 2024/10/28 TI - Utilization of, Perceptions on, and Intention to Use AI Chatbots Among Medical Students in China: National Cross-Sectional Study JO - JMIR Med Educ SP - e57132 VL - 10 KW - medical education KW - artificial intelligence KW - UTAUT model KW - utilization KW - medical students KW - cross-sectional study KW - AI chatbots KW - China KW - acceptance KW - electronic survey KW - social media KW - medical information KW - risk KW - training KW - support N2 - Background: Artificial intelligence (AI) chatbots are poised to have a profound impact on medical education. Medical students, as early adopters of technology and future health care providers, play a crucial role in shaping the future of health care. However, little is known about the utilization of, perceptions on, and intention to use AI chatbots among medical students in China. Objective: This study aims to explore the utilization of, perceptions on, and intention to use generative AI chatbots among medical students in China, using the Unified Theory of Acceptance and Use of Technology (UTAUT) framework. By conducting a national cross-sectional survey, we sought to identify the key determinants that influence medical students? acceptance of AI chatbots, thereby providing a basis for enhancing their integration into medical education. Understanding these factors is crucial for educators, policy makers, and technology developers to design and implement effective AI-driven educational tools that align with the needs and expectations of future health care professionals. Methods: A web-based electronic survey questionnaire was developed and distributed via social media to medical students across the country. The UTAUT was used as a theoretical framework to design the questionnaire and analyze the data. The relationship between behavioral intention to use AI chatbots and UTAUT predictors was examined using multivariable regression. Results: A total of 693 participants were from 57 universities covering 21 provinces or municipalities in China. Only a minority (199/693, 28.72%) reported using AI chatbots for studying, with ChatGPT (129/693, 18.61%) being the most commonly used. Most of the participants used AI chatbots for quickly obtaining medical information and knowledge (631/693, 91.05%) and increasing learning efficiency (594/693, 85.71%). Utilization behavior, social influence, facilitating conditions, perceived risk, and personal innovativeness showed significant positive associations with the behavioral intention to use AI chatbots (all P values were <.05). Conclusions: Chinese medical students hold positive perceptions toward and high intentions to use AI chatbots, but there are gaps between intention and actual adoption. This highlights the need for strategies to improve access, training, and support and provide peer usage examples to fully harness the potential benefits of chatbot technology. UR - https://mededu.jmir.org/2024/1/e57132 UR - http://dx.doi.org/10.2196/57132 ID - info:doi/10.2196/57132 ER - TY - JOUR AU - Kim, Youlim AU - Lee, Hyeonkyeong AU - Park, Jeongok AU - Kim, Yong-Chan AU - Kim, Hee Dong AU - Lee, Young-Me PY - 2024/10/28 TI - eHealth Communication Intervention to Promote Human Papillomavirus Vaccination Among Middle-School Girls: Development and Usability Study JO - JMIR Form Res SP - e59087 VL - 8 KW - cervical cancer KW - human papillomavirus KW - vaccines KW - health communication KW - chatbot KW - artificial intelligence KW - adolescent KW - mobile phone N2 - Background: As the age of initiating sexual intercourse has gradually decreased among South Korean adolescents, earlier vaccination of adolescents for human papillomavirus (HPV) is necessary before their exposure to HPV. Health communication includes ?cues to action? that lead to preventive health behaviors, and recently, social networking services, which operate with fewer time and space constraints, have been used in various studies as a form of eHealth communication. Objective: This study aims to investigate the feasibility and usability of an eHealth communication intervention for HPV vaccination in middle-school girls aimed at the girls and their mothers. Methods: The eHealth communication intervention for HPV vaccination was developed using a 6-step intervention mapping process: needs assessments, setting program outcomes, selection of a theory-based method and practical strategies, development of the intervention, implementation plan, and testing the validity of the intervention. Results: A review of 10 studies identified effective health communication messages, delivery methods, and theories for HPV vaccination among adolescents. Barriers including low knowledge, perceived threat, and the inconvenience of taking 2 doses of the vaccine were identified through focus groups, suggesting a need for youth-friendly and easy-to-understand information for adolescents delivered via mobile phones. The expected outcomes and the performance objectives are specifically tailored to reflect the vaccination intention. Behavior change techniques were applied using trusted sources and a health belief model. Health messages delivered through a KakaoTalk chatbot improved awareness and self-efficacy. Quality control was ensured with the use of a log system. The experts? chatbot usability average score was 80.13 (SD 8.15) and the average score of girls was 84.06 (SD 7.61). Conclusions: Future studies need to verify the effectiveness of health communication strategies in promoting HPV vaccination and the effectiveness of scientific intervention using a chatbot as a delivery method for the intervention. UR - https://formative.jmir.org/2024/1/e59087 UR - http://dx.doi.org/10.2196/59087 UR - http://www.ncbi.nlm.nih.gov/pubmed/39466304 ID - info:doi/10.2196/59087 ER - TY - JOUR AU - Dana, Zara AU - Nagra, Harpreet AU - Kilby, Kimberly PY - 2024/10/25 TI - Role of Synchronous, Moderated, and Anonymous Peer Support Chats on Reducing Momentary Loneliness in Older Adults: Retrospective Observational Study JO - JMIR Form Res SP - e59501 VL - 8 KW - digital peer support KW - social loneliness KW - chat-based interactions KW - older adults N2 - Background: Older adults have a high rate of loneliness, which contributes to increased psychosocial risk, medical morbidity, and mortality. Digital emotional support interventions provide a convenient and rapid avenue for additional support. Digital peer support interventions for emotional struggles contrast the usual provider-based clinical care models because they offer more accessible, direct support for empowerment, highlighting the users? autonomy, competence, and relatedness. Objective: This study aims to examine a novel anonymous and synchronous peer-to-peer digital chat service facilitated by trained human moderators. The experience of a cohort of 699 adults aged ?65 years was analyzed to determine (1) if participation, alone, led to measurable aggregate change in momentary loneliness and optimism and (2) the impact of peers on momentary loneliness and optimism. Methods: Participants were each prompted with a single question: ?What?s your struggle?? Using a proprietary artificial intelligence model, the free-text response automatched the respondent based on their self-expressed emotional struggle to peers and a chat moderator. Exchanged messages were analyzed to quantitatively measure the change in momentary loneliness and optimism using a third-party, public, natural language processing model (GPT-4 [OpenAI]). The sentiment change analysis was initially performed at the individual level and then averaged across all users with similar emotion types to produce a statistically significant (P<.05) collective trend per emotion. To evaluate the peer impact on momentary loneliness and optimism, we performed propensity matching to align the moderator+single user and moderator+small group chat cohorts and then compare the emotion trends between the matched cohorts. Results: Loneliness and optimism trends significantly improved after 8 (P=.02) to 9 minutes (P=.03) into the chat. We observed a significant improvement in the momentary loneliness and optimism trends between the moderator+small group compared to the moderator+single user chat cohort after 19 (P=.049) and 21 minutes (P=.04) for optimism and loneliness, respectively. Conclusions: Chat-based peer support may be a viable intervention to help address momentary loneliness in older adults and present an alternative to traditional care. The promising results support the need for further study to expand the evidence for such cost-effective options. UR - https://formative.jmir.org/2024/1/e59501 UR - http://dx.doi.org/10.2196/59501 UR - http://www.ncbi.nlm.nih.gov/pubmed/39453688 ID - info:doi/10.2196/59501 ER - TY - JOUR AU - So, Jae-hee AU - Chang, Joonhwan AU - Kim, Eunji AU - Na, Junho AU - Choi, JiYeon AU - Sohn, Jy-yong AU - Kim, Byung-Hoon AU - Chu, Hui Sang PY - 2024/10/24 TI - Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study JO - JMIR Form Res SP - e58418 VL - 8 KW - large language model KW - psychiatric interview KW - interview summarization KW - symptom delineation N2 - Background: Recent advancements in large language models (LLMs) have accelerated their use across various domains. Psychiatric interviews, which are goal-oriented and structured, represent a significantly underexplored area where LLMs can provide substantial value. In this study, we explore the application of LLMs to enhance psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced traumatic events and mental health issues. Objective: This study aims to investigate whether LLMs can (1) delineate parts of the conversation that suggest psychiatric symptoms and identify those symptoms, and (2) summarize stressors and symptoms based on the interview dialogue transcript. Methods: Given the interview transcripts, we align the LLMs to perform 3 tasks: (1) extracting stressors from the transcripts, (2) delineating symptoms and their indicative sections, and (3) summarizing the patients based on the extracted stressors and symptoms. These 3 tasks address the 2 objectives, where delineating symptoms is based on the output from the second task, and generating the summary of the interview incorporates the outputs from all 3 tasks. In this context, the transcript data were labeled by mental health experts for the training and evaluation of the LLMs. Results: First, we present the performance of LLMs in estimating (1) the transcript sections related to psychiatric symptoms and (2) the names of the corresponding symptoms. In the zero-shot inference setting using the GPT-4 Turbo model, 73 out of 102 transcript segments demonstrated a recall mid-token distance d<20 for estimating the sections associated with the symptoms. For evaluating the names of the corresponding symptoms, the fine-tuning method demonstrates a performance advantage over the zero-shot inference setting of the GPT-4 Turbo model. On average, the fine-tuning method achieves an accuracy of 0.82, a precision of 0.83, a recall of 0.82, and an F1-score of 0.82. Second, the transcripts are used to generate summaries for each interviewee using LLMs. This generative task was evaluated using metrics such as Generative Evaluation (G-Eval) and Bidirectional Encoder Representations from Transformers Score (BERTScore). The summaries generated by the GPT-4 Turbo model, utilizing both symptom and stressor information, achieve high average G-Eval scores: coherence of 4.66, consistency of 4.73, fluency of 2.16, and relevance of 4.67. Furthermore, it is noted that the use of retrieval-augmented generation did not lead to a significant improvement in performance. Conclusions: LLMs, using either (1) appropriate prompting techniques or (2) fine-tuning methods with data labeled by mental health experts, achieved an accuracy of over 0.8 for the symptom delineation task when measured across all segments in the transcript. Additionally, they attained a G-Eval score of over 4.6 for coherence in the summarization task. This research contributes to the emerging field of applying LLMs in psychiatric interviews and demonstrates their potential effectiveness in assisting mental health practitioners. UR - https://formative.jmir.org/2024/1/e58418 UR - http://dx.doi.org/10.2196/58418 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58418 ER - TY - JOUR AU - Achtari, Margaux AU - Salihu, Adil AU - Muller, Olivier AU - Abbé, Emmanuel AU - Clair, Carole AU - Schwarz, Joëlle AU - Fournier, Stephane PY - 2024/10/22 TI - Gender Bias in AI's Perception of Cardiovascular Risk JO - J Med Internet Res SP - e54242 VL - 26 KW - artificial intelligence KW - gender equity KW - coronary artery disease KW - AI KW - cardiovascular KW - risk KW - CAD KW - artery KW - coronary KW - chatbot: health care KW - men: women KW - gender bias KW - gender UR - https://www.jmir.org/2024/1/e54242 UR - http://dx.doi.org/10.2196/54242 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54242 ER - TY - JOUR AU - Guo, Zhijun AU - Lai, Alvina AU - Thygesen, H. Johan AU - Farrington, Joseph AU - Keen, Thomas AU - Li, Kezhi PY - 2024/10/18 TI - Large Language Models for Mental Health Applications: Systematic Review JO - JMIR Ment Health SP - e57400 VL - 11 KW - large language models KW - mental health KW - digital health care KW - ChatGPT KW - Bidirectional Encoder Representations from Transformers KW - BERT N2 - Background: Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate. Objective: This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on their applicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessing the evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlighting the potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use. Methods: Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, this review searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM Digital Library. Keywords used were (mental health OR mental illness OR mental disorder OR psychiatry) AND (large language models). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published in languages other than English. Results: In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideation detection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on other applications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues and providing accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated with clinical use might surpass their benefits. These risks include inconsistencies in generated text; the production of hallucinations; and the absence of a comprehensive, benchmarked ethical framework. Conclusions: This systematic review examines the clinical applications of LLMs in mental health, highlighting their potential and inherent risks. The study identifies several issues: the lack of multilingual datasets annotated by experts, concerns regarding the accuracy and reliability of generated content, challenges in interpretability due to the ?black box? nature of LLMs, and ongoing ethical dilemmas. These ethical concerns include the absence of a clear, benchmarked ethical framework; data privacy issues; and the potential for overreliance on LLMs by both physicians and patients, which could compromise traditional medical practices. As a result, LLMs should not be considered substitutes for professional mental health services. However, the rapid development of LLMs underscores their potential as valuable clinical aids, emphasizing the need for continued research and development in this area. Trial Registration: PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617 UR - https://mental.jmir.org/2024/1/e57400 UR - http://dx.doi.org/10.2196/57400 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57400 ER - TY - JOUR AU - Fietta, Valentina AU - Rizzi, Silvia AU - De Luca, Chiara AU - Gios, Lorenzo AU - Pavesi, Chiara Maria AU - Gabrielli, Silvia AU - Monaro, Merylin AU - Forti, Stefano PY - 2024/10/18 TI - A Chatbot-Based Version of the World Health Organization?Validated Self-Help Plus Intervention for Stress Management: Co-Design and Usability Testing JO - JMIR Hum Factors SP - e64614 VL - 11 KW - acceptance and commitment therapy KW - ACT KW - well-being KW - pregnancy KW - breast cancer KW - eHealth KW - mobile health KW - mHealth KW - development KW - usability KW - user-centered design N2 - Background: Advancements in technology offer new opportunities to support vulnerable populations, such as pregnant women and women diagnosed with breast cancer, during physiologically and psychologically stressful periods. Objective: This study aims to adapt and co-design the World Health Organization?s Self-Help Plus intervention into a mobile health intervention for these target groups. Methods: On the basis of the Obesity-Related Behavioral Intervention Trials and Center for eHealth Research and Disease Management models, low-fidelity and high-fidelity prototypes were developed. Prototypes were evaluated by 13 domain experts from diverse sectors and 15 participants from the target groups to assess usability, attractiveness, and functionality through semantic differential scales, the User Version of the Mobile Application Rating Scale questionnaire, and semistructured interviews. Results: Feedback from participants indicated positive perceptions of the mobile health intervention, highlighting its ease of use, appropriate language, and attractive multimedia content. Areas identified for improvement included enhancing user engagement through reminders, monitoring features, and increased personalization. The quality of the content and adherence to initial protocols were positively evaluated. Conclusions: This research provides valuable insights for future studies aiming to enhance the usability, efficacy, and effectiveness of the app, suggesting the potential role of a chatbot-delivered Self-Help Plus intervention as a supportive tool for pregnant women and women with a breast cancer diagnosis. UR - https://humanfactors.jmir.org/2024/1/e64614 UR - http://dx.doi.org/10.2196/64614 UR - http://www.ncbi.nlm.nih.gov/pubmed/39355954 ID - info:doi/10.2196/64614 ER - TY - JOUR AU - Golden, Ashleigh AU - Aboujaoude, Elias PY - 2024/10/18 TI - Describing the Framework for AI Tool Assessment in Mental Health and Applying It to a Generative AI Obsessive-Compulsive Disorder Platform: Tutorial JO - JMIR Form Res SP - e62963 VL - 8 KW - artificial intelligence KW - ChatGPT KW - generative artificial intelligence KW - generative AI KW - large language model KW - chatbots KW - machine learning KW - digital health KW - telemedicine KW - psychotherapy KW - obsessive-compulsive disorder UR - https://formative.jmir.org/2024/1/e62963 UR - http://dx.doi.org/10.2196/62963 UR - http://www.ncbi.nlm.nih.gov/pubmed/39423001 ID - info:doi/10.2196/62963 ER - TY - JOUR AU - Jo, Eunbeen AU - Yoo, Hakje AU - Kim, Jong-Ho AU - Kim, Young-Min AU - Song, Sanghoun AU - Joo, Joon Hyung PY - 2024/10/18 TI - Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study JO - JMIR Form Res SP - e47814 VL - 8 KW - natural language processing KW - bidirectional encoder representations from transformers KW - large language model KW - generative pretrained transformer KW - medical specialty prediction KW - quality of care KW - health care application KW - ChatGPT KW - BERT KW - AI technology KW - conversational agent KW - AI KW - artificial intelligence KW - chatbot KW - application KW - health care N2 - Background: Patients often struggle with determining which outpatient specialist to consult based on their symptoms. Natural language processing models in health care offer the potential to assist patients in making these decisions before visiting a hospital. Objective: This study aimed to evaluate the performance of ChatGPT in recommending medical specialties for medical questions. Methods: We used a dataset of 31,482 medical questions, each answered by doctors and labeled with the appropriate medical specialty from the health consultation board of NAVER (NAVER Corp), a major Korean portal. This dataset includes 27 distinct medical specialty labels. We compared the performance of the fine-tuned Korean Medical bidirectional encoder representations from transformers (KM-BERT) and ChatGPT models by analyzing their ability to accurately recommend medical specialties. We categorized responses from ChatGPT into those matching the 27 predefined specialties and those that did not. Both models were evaluated using performance metrics of accuracy, precision, recall, and F1-score. Results: ChatGPT demonstrated an answer avoidance rate of 6.2% but provided accurate medical specialty recommendations with explanations that elucidated the underlying pathophysiology of the patient?s symptoms. It achieved an accuracy of 0.939, precision of 0.219, recall of 0.168, and an F1-score of 0.134. In contrast, the KM-BERT model, fine-tuned for the same task, outperformed ChatGPT with an accuracy of 0.977, precision of 0.570, recall of 0.652, and an F1-score of 0.587. Conclusions: Although ChatGPT did not surpass the fine-tuned KM-BERT model in recommending the correct medical specialties, it showcased notable advantages as a conversational artificial intelligence model. By providing detailed, contextually appropriate explanations, ChatGPT has the potential to significantly enhance patient comprehension of medical information, thereby improving the medical referral process. UR - https://formative.jmir.org/2024/1/e47814 UR - http://dx.doi.org/10.2196/47814 UR - http://www.ncbi.nlm.nih.gov/pubmed/39423004 ID - info:doi/10.2196/47814 ER - TY - JOUR AU - Peng, Wei AU - Lee, Rin Hee AU - Lim, Sue PY - 2024/10/11 TI - Leveraging Chatbots to Combat Health Misinformation for Older Adults: Participatory Design Study JO - JMIR Form Res SP - e60712 VL - 8 KW - chatbot KW - conversational agent KW - older adults KW - health misinformation KW - participatory design N2 - Background: Older adults, a population particularly susceptible to misinformation, may experience attempts at health-related scams or defrauding, and they may unknowingly spread misinformation. Previous research has investigated managing misinformation through media literacy education or supporting users by fact-checking information and cautioning for potential misinformation content, yet studies focusing on older adults are limited. Chatbots have the potential to educate and support older adults in misinformation management. However, many studies focusing on designing technology for older adults use the needs-based approach and consider aging as a deficit, leading to issues in technology adoption. Instead, we adopted the asset-based approach, inviting older adults to be active collaborators in envisioning how intelligent technologies can enhance their misinformation management practices. Objective: This study aims to understand how older adults may use chatbots? capabilities for misinformation management. Methods: We conducted 5 participatory design workshops with a total of 17 older adult participants to ideate ways in which chatbots can help them manage misinformation. The workshops included 3 stages: developing scenarios reflecting older adults? encounters with misinformation in their lives, understanding existing chatbot platforms, and envisioning how chatbots can help intervene in the scenarios from stage 1. Results: We found that issues with older adults? misinformation management arose more from interpersonal relationships than individuals? ability to detect misinformation in pieces of information. This finding underscored the importance of chatbots to act as mediators that facilitate communication and help resolve conflict. In addition, participants emphasized the importance of autonomy. They desired chatbots to teach them to navigate the information landscape and come to conclusions about misinformation on their own. Finally, we found that older adults? distrust in IT companies and governments? ability to regulate the IT industry affected their trust in chatbots. Thus, chatbot designers should consider using well-trusted sources and practicing transparency to increase older adults? trust in the chatbot-based tools. Overall, our results highlight the need for chatbot-based misinformation tools to go beyond fact checking. Conclusions: This study provides insights for how chatbots can be designed as part of technological systems for misinformation management among older adults. Our study underscores the importance of inviting older adults to be active co-designers of chatbot-based interventions. UR - https://formative.jmir.org/2024/1/e60712 UR - http://dx.doi.org/10.2196/60712 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60712 ER - TY - JOUR AU - Miao, Jing AU - Thongprayoon, Charat AU - Garcia Valencia, Oscar AU - Craici, M. Iasmina AU - Cheungpasitporn, Wisit PY - 2024/10/10 TI - Navigating Nephrology's Decline Through a GPT-4 Analysis of Internal Medicine Specialties in the United States: Qualitative Study JO - JMIR Med Educ SP - e57157 VL - 10 KW - artificial intelligence KW - ChatGPT KW - nephrology fellowship training KW - fellowship matching KW - medical education KW - AI KW - nephrology KW - fellowship KW - United States KW - factor KW - chatbots KW - intellectual KW - complexity KW - work-life balance KW - procedural involvement KW - opportunity KW - career demand KW - financial compensation N2 - Background: The 2024 Nephrology fellowship match data show the declining interest in nephrology in the United States, with an 11% drop in candidates and a mere 66% (321/488) of positions filled. Objective: The study aims to discern the factors influencing this trend using ChatGPT, a leading chatbot model, for insights into the comparative appeal of nephrology versus other internal medicine specialties. Methods: Using the GPT-4 model, the study compared nephrology with 13 other internal medicine specialties, evaluating each on 7 criteria including intellectual complexity, work-life balance, procedural involvement, research opportunities, patient relationships, career demand, and financial compensation. Each criterion was assigned scores from 1 to 10, with the cumulative score determining the ranking. The approach included counteracting potential bias by instructing GPT-4 to favor other specialties over nephrology in reverse scenarios. Results: GPT-4 ranked nephrology only above sleep medicine. While nephrology scored higher than hospice and palliative medicine, it fell short in key criteria such as work-life balance, patient relationships, and career demand. When examining the percentage of filled positions in the 2024 appointment year match, nephrology?s filled rate was 66%, only higher than the 45% (155/348) filled rate of geriatric medicine. Nephrology?s score decreased by 4%?14% in 5 criteria including intellectual challenge and complexity, procedural involvement, career opportunity and demand, research and academic opportunities, and financial compensation. Conclusions: ChatGPT does not favor nephrology over most internal medicine specialties, highlighting its diminishing appeal as a career choice. This trend raises significant concerns, especially considering the overall physician shortage, and prompts a reevaluation of factors affecting specialty choice among medical residents. UR - https://mededu.jmir.org/2024/1/e57157 UR - http://dx.doi.org/10.2196/57157 ID - info:doi/10.2196/57157 ER - TY - JOUR AU - Wegener, Kauffeldt Emilie AU - M Bergschöld, Jenny AU - Kramer, Tina AU - Schmidt, Wong Camilla AU - Borgnakke, Karen PY - 2024/10/8 TI - Co-Designing a Conversational Agent With Older Adults With Chronic Obstructive Pulmonary Disease Who Age in Place: Qualitative Study JO - JMIR Hum Factors SP - e63222 VL - 11 KW - eHealth KW - aging in place KW - digital health technology KW - health literacy KW - everyday life KW - co-design KW - co-designing KW - conversational agent KW - older adults KW - elderly KW - COPD KW - thematic analysis KW - design KW - development KW - interview data KW - cocreation KW - chronic obstructive pulmonary disease KW - mobile phone N2 - Background: As a reaction to the global demographic increase in older adults (aged 60+ years), policy makers call for initiatives to enable healthy aging. This includes a focus on person-centered care and access to long-term care for older adults, such as developing different services and digital health technologies. This can enable patients to engage in their health and reduce the burden on the health care systems and health care professionals. The European Union project Smart Inclusive Living Environments (SMILE) focuses on well-being and aging in place using new digital health technologies. The novelty of the SMILE project is the use of a cocreational approach focused on the needs and preferences of older adults with chronic obstructive pulmonary disease (COPD) in technology development, to enhance access, adaptation, and usability and to reduce stigma. Objective: The study aimed to describe the perspective, needs, and preferences of older adults living with COPD in the context of the design and development of a conversational agent. Methods: This study carried out a data-driven thematic analysis of interview data from 11 cocreation workshops with 33 older adults living with COPD. Results: The three particular features that the workshop participants wanted to implement in a new technology were (1) a ?my health? function, to use technology to manage and learn more about their condition; (2) a ?daily activities? function, including an overview and information about social and physical activities in their local area; and (3) a ?sleep? function, to manage circadian rhythm and enhance sleep quality, for example, through online video guides. In total, 2 overarching themes were identified for the 3 functions: measurements, which were actively discussed and received mixed interest among the participants, and health literacy, due to an overall interest in learning more about their condition in relation to everyday life. Conclusions: The future design of digital health technology must embrace the complexities of the everyday life of an older adult living with COPD and cater to their needs and preferences. Measurements should be optional and personalized, and digital solutions should be used as a supplement to health care professionals, not as substitute. UR - https://humanfactors.jmir.org/2024/1/e63222 UR - http://dx.doi.org/10.2196/63222 UR - http://www.ncbi.nlm.nih.gov/pubmed/39378067 ID - info:doi/10.2196/63222 ER - TY - JOUR AU - Klapow, C. Max AU - Rosenblatt, Andrew AU - Lachman, Jamie AU - Gardner, Frances PY - 2024/10/7 TI - The Feasibility and Acceptability of Using a Digital Conversational Agent (Chatbot) for Delivering Parenting Interventions: Systematic Review JO - JMIR Pediatr Parent SP - e55726 VL - 7 KW - chatbot KW - parenting intervention KW - feasibility KW - acceptability KW - systematic review KW - implementation N2 - Background: Parenting interventions are crucial for promoting family well-being, reducing violence against children, and improving child development outcomes; however, scaling these programs remains a challenge. Prior reviews have characterized the feasibility, acceptability, and effectiveness of other more robust forms of digital parenting interventions (eg, via the web, mobile apps, and videoconferencing). Recently, chatbot technology has emerged as a possible mode for adapting and delivering parenting programs to larger populations (eg, Parenting for Lifelong Health, Incredible Years, and Triple P Parenting). Objective: This study aims to review the evidence of using chatbots to deliver parenting interventions and assess the feasibility of implementation, acceptability of these interventions, and preliminary outcomes. Methods: This review conducted a comprehensive search of databases, including Web of Science, MEDLINE, Scopus, ProQuest, and Cochrane Central Register of Controlled Trials. Cochrane Handbook for Systematic Review of Interventions and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were used to conduct the search. Eligible studies targeted parents of children aged 0 to 18 years; used chatbots via digital platforms, such as the internet, mobile apps, or SMS text messaging; and targeted improving family well-being through parenting. Implementation measures, acceptability, and any reported preliminary measures of effectiveness were included. Results: Of the 1766 initial results, 10 studies met the inclusion criteria. The included studies, primarily conducted in high-income countries (8/10, 80%), demonstrated a high mean retention rate (72.8%) and reported high acceptability (10/10, 100%). However, significant heterogeneity in interventions, measurement methods, and study quality necessitate cautious interpretation. Reporting bias, lack of clarity in the operationalization of engagement measures, and platform limitations were identified as limiting factors in interpreting findings. Conclusions: This is the first study to review the implementation feasibility and acceptability of chatbots for delivering parenting programs. While preliminary evidence suggests that chatbots can be used to deliver parenting programs, further research, standardization of reporting, and scaling up of effectiveness testing are critical to harness the full benefits of chatbots for promoting family well-being. UR - https://pediatrics.jmir.org/2024/1/e55726 UR - http://dx.doi.org/10.2196/55726 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55726 ER - TY - JOUR AU - Li, Xingang AU - Guo, Heng AU - Li, Dandan AU - Zheng, Yingming PY - 2024/10/4 TI - Engine of Innovation in Hospital Pharmacy: Applications and Reflections of ChatGPT JO - J Med Internet Res SP - e51635 VL - 26 KW - ChatGPT KW - hospital pharmacy KW - natural language processing KW - drug information KW - drug therapy KW - drug interaction KW - scientific research KW - innovation KW - pharmacy KW - quality KW - safety KW - pharmaceutical care KW - tool KW - medical care quality UR - https://www.jmir.org/2024/1/e51635 UR - http://dx.doi.org/10.2196/51635 UR - http://www.ncbi.nlm.nih.gov/pubmed/39365643 ID - info:doi/10.2196/51635 ER - TY - JOUR AU - Wu, Zelin AU - Gan, Wenyi AU - Xue, Zhaowen AU - Ni, Zhengxin AU - Zheng, Xiaofei AU - Zhang, Yiyi PY - 2024/10/3 TI - Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study JO - JMIR Med Educ SP - e52746 VL - 10 KW - artificial intelligence KW - ChatGPT KW - nursing licensure examination KW - nursing KW - LLMs KW - large language models KW - nursing education KW - AI KW - nursing student KW - large language model KW - licensing KW - observation KW - observational study KW - China KW - USA KW - United States of America KW - auxiliary tool KW - accuracy rate KW - theoretical N2 - Background: The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT?s performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE. Objective: This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice. Methods: First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared. Results: The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5?s Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs. Conclusions: This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making. UR - https://mededu.jmir.org/2024/1/e52746 UR - http://dx.doi.org/10.2196/52746 ID - info:doi/10.2196/52746 ER - TY - JOUR AU - Armbruster, Jonas AU - Bussmann, Florian AU - Rothhaas, Catharina AU - Titze, Nadine AU - Grützner, Alfred Paul AU - Freischmidt, Holger PY - 2024/10/1 TI - ?Doctor ChatGPT, Can You Help Me?? The Patient?s Perspective: Cross-Sectional Study JO - J Med Internet Res SP - e58831 VL - 26 KW - artificial intelligence KW - AI KW - large language models KW - LLM KW - ChatGPT KW - patient education KW - patient information KW - patient perceptions KW - chatbot KW - chatbots KW - empathy N2 - Background: Artificial intelligence and the language models derived from it, such as ChatGPT, offer immense possibilities, particularly in the field of medicine. It is already evident that ChatGPT can provide adequate and, in some cases, expert-level responses to health-related queries and advice for patients. However, it is currently unknown how patients perceive these capabilities, whether they can derive benefit from them, and whether potential risks, such as harmful suggestions, are detected by patients. Objective: This study aims to clarify whether patients can get useful and safe health care advice from an artificial intelligence chatbot assistant. Methods: This cross-sectional study was conducted using 100 publicly available health-related questions from 5 medical specialties (trauma, general surgery, otolaryngology, pediatrics, and internal medicine) from a web-based platform for patients. Responses generated by ChatGPT-4.0 and by an expert panel (EP) of experienced physicians from the aforementioned web-based platform were packed into 10 sets consisting of 10 questions each. The blinded evaluation was carried out by patients regarding empathy and usefulness (assessed through the question: ?Would this answer have helped you??) on a scale from 1 to 5. As a control, evaluation was also performed by 3 physicians in each respective medical specialty, who were additionally asked about the potential harm of the response and its correctness. Results: In total, 200 sets of questions were submitted by 64 patients (mean 45.7, SD 15.9 years; 29/64, 45.3% male), resulting in 2000 evaluated answers of ChatGPT and the EP each. ChatGPT scored higher in terms of empathy (4.18 vs 2.7; P<.001) and usefulness (4.04 vs 2.98; P<.001). Subanalysis revealed a small bias in terms of levels of empathy given by women in comparison with men (4.46 vs 4.14; P=.049). Ratings of ChatGPT were high regardless of the participant?s age. The same highly significant results were observed in the evaluation of the respective specialist physicians. ChatGPT outperformed significantly in correctness (4.51 vs 3.55; P<.001). Specialists rated the usefulness (3.93 vs 4.59) and correctness (4.62 vs 3.84) significantly lower in potentially harmful responses from ChatGPT (P<.001). This was not the case among patients. Conclusions: The results indicate that ChatGPT is capable of supporting patients in health-related queries better than physicians, at least in terms of written advice through a web-based platform. In this study, ChatGPT?s responses had a lower percentage of potentially harmful advice than the web-based EP. However, it is crucial to note that this finding is based on a specific study design and may not generalize to all health care settings. Alarmingly, patients are not able to independently recognize these potential dangers. UR - https://www.jmir.org/2024/1/e58831 UR - http://dx.doi.org/10.2196/58831 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58831 ER - TY - JOUR AU - Ronquillo, G. Jay AU - Ye, Jamie AU - Gorman, Donal AU - Lemeshow, R. Adina AU - Watt, J. Stephen PY - 2024/9/30 TI - Practical Aspects of Using Large Language Models to Screen Abstracts for Cardiovascular Drug Development: Cross-Sectional Study JO - JMIR Med Inform SP - e64143 VL - 12 KW - biomedical informatics KW - drug development KW - cardiology KW - cardio KW - LLM KW - biomedical KW - drug KW - cross-sectional study KW - biomarker KW - cardiovascular KW - screening optimization KW - GPT KW - large language model KW - AI KW - artificial intelligence UR - https://medinform.jmir.org/2024/1/e64143 UR - http://dx.doi.org/10.2196/64143 ID - info:doi/10.2196/64143 ER - TY - JOUR AU - Nguyen, Hoang Michelle AU - Sedoc, João AU - Taylor, Overby Casey PY - 2024/9/30 TI - Usability, Engagement, and Report Usefulness of Chatbot-Based Family Health History Data Collection: Mixed Methods Analysis JO - J Med Internet Res SP - e55164 VL - 26 KW - family health history KW - chatbots KW - conversational agents KW - digital health tools KW - usability KW - engagement KW - report usefulness KW - evaluation KW - crowdsourcing KW - mixed methods N2 - Background: Family health history (FHx) is an important predictor of a person?s genetic risk but is not collected by many adults in the United States. Objective: This study aims to test and compare the usability, engagement, and report usefulness of 2 web-based methods to collect FHx. Methods: This mixed methods study compared FHx data collection using a flow-based chatbot (KIT; the curious interactive test) and a form-based method. KIT?s design was optimized to reduce user burden. We recruited and randomized individuals from 2 crowdsourced platforms to 1 of the 2 FHx methods. All participants were asked to complete a questionnaire to assess the method?s usability, the usefulness of a report summarizing their experience, user-desired chatbot enhancements, and general user experience. Engagement was studied using log data collected by the methods. We used qualitative findings from analyzing free-text comments to supplement the primary quantitative results. Results: Participants randomized to KIT reported higher usability than those randomized to the form, with a mean System Usability Scale score of 80.2 versus 61.9 (P<.001), respectively. The engagement analysis reflected design differences in the onboarding process. KIT users spent less time entering FHx information and reported more conditions than form users (mean 5.90 vs 7.97 min; P=.04; and mean 7.8 vs 10.1 conditions; P=.04). Both KIT and form users somewhat agreed that the report was useful (Likert scale ratings of 4.08 and 4.29, respectively). Among desired enhancements, personalization was the highest-rated feature (188/205, 91.7% rated medium- to high-priority). Qualitative analyses revealed positive and negative characteristics of both KIT and the form-based method. Among respondents randomized to KIT, most indicated it was easy to use and navigate and that they could respond to and understand user prompts. Negative comments addressed KIT?s personality, conversational pace, and ability to manage errors. For KIT and form respondents, qualitative results revealed common themes, including a desire for more information about conditions and a mutual appreciation for the multiple-choice button response format. Respondents also said they wanted to report health information beyond KIT?s prompts (eg, personal health history) and for KIT to provide more personalized responses. Conclusions: We showed that KIT provided a usable way to collect FHx. We also identified design considerations to improve chatbot-based FHx data collection: First, the final report summarizing the FHx collection experience should be enhanced to provide more value for patients. Second, the onboarding chatbot prompt may impact data quality and should be carefully considered. Finally, we highlighted several areas that could be improved by moving from a flow-based chatbot to a large language model implementation strategy. UR - https://www.jmir.org/2024/1/e55164 UR - http://dx.doi.org/10.2196/55164 UR - http://www.ncbi.nlm.nih.gov/pubmed/39348188 ID - info:doi/10.2196/55164 ER - TY - JOUR AU - Salmi, Salim AU - Mérelle, Saskia AU - Gilissen, Renske AU - van der Mei, Rob AU - Bhulai, Sandjai PY - 2024/9/26 TI - The Most Effective Interventions for Classification Model Development to Predict Chat Outcomes Based on the Conversation Content in Online Suicide Prevention Chats: Machine Learning Approach JO - JMIR Ment Health SP - e57362 VL - 11 KW - suicide KW - suicidality KW - suicide prevention KW - helpline KW - suicide helpline KW - classification KW - interpretable AI KW - explainable AI KW - conversations KW - BERT KW - bidirectional encoder representations from transformers KW - machine learning KW - artificial intelligence KW - large language models KW - LLM KW - natural language processing N2 - Background: For the provision of optimal care in a suicide prevention helpline, it is important to know what contributes to positive or negative effects on help seekers. Helplines can often be contacted through text-based chat services, which produce large amounts of text data for use in large-scale analysis. Objective: We trained a machine learning classification model to predict chat outcomes based on the content of the chat conversations in suicide helplines and identified the counsellor utterances that had the most impact on its outputs. Methods: From August 2021 until January 2023, help seekers (N=6903) scored themselves on factors known to be associated with suicidality (eg, hopelessness, feeling entrapped, will to live) before and after a chat conversation with the suicide prevention helpline in the Netherlands (113 Suicide Prevention). Machine learning text analysis was used to predict help seeker scores on these factors. Using 2 approaches for interpreting machine learning models, we identified text messages from helpers in a chat that contributed the most to the prediction of the model. Results: According to the machine learning model, helpers? positive affirmations and expressing involvement contributed to improved scores of the help seekers. Use of macros and ending the chat prematurely due to the help seeker being in an unsafe situation had negative effects on help seekers. Conclusions: This study reveals insights for improving helpline chats, emphasizing the value of an evocative style with questions, positive affirmations, and practical advice. It also underscores the potential of machine learning in helpline chat analysis. UR - https://mental.jmir.org/2024/1/e57362 UR - http://dx.doi.org/10.2196/57362 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57362 ER - TY - JOUR AU - Kumar, Tanuj Ash AU - Wang, Cindy AU - Dong, Alec AU - Rose, Jonathan PY - 2024/9/26 TI - Generation of Backward-Looking Complex Reflections for a Motivational Interviewing?Based Smoking Cessation Chatbot Using GPT-4: Algorithm Development and Validation JO - JMIR Ment Health SP - e53778 VL - 11 KW - motivational interviewing KW - smoking cessation KW - therapy KW - automated therapy KW - natural language processing KW - large language models KW - GPT-4 KW - chatbot KW - dialogue agent KW - reflections KW - reflection generation KW - smoking KW - cessation KW - ChatGPT KW - smokers KW - smoker KW - effectiveness KW - messages N2 - Background: Motivational interviewing (MI) is a therapeutic technique that has been successful in helping smokers reduce smoking but has limited accessibility due to the high cost and low availability of clinicians. To address this, the MIBot project has sought to develop a chatbot that emulates an MI session with a client with the specific goal of moving an ambivalent smoker toward the direction of quitting. One key element of an MI conversation is reflective listening, where a therapist expresses their understanding of what the client has said by uttering a reflection that encourages the client to continue their thought process. Complex reflections link the client?s responses to relevant ideas and facts to enhance this contemplation. Backward-looking complex reflections (BLCRs) link the client?s most recent response to a relevant selection of the client?s previous statements. Our current chatbot can generate complex reflections?but not BLCRs?using large language models (LLMs) such as GPT-2, which allows the generation of unique, human-like messages customized to client responses. Recent advancements in these models, such as the introduction of GPT-4, provide a novel way to generate complex text by feeding the models instructions and conversational history directly, making this a promising approach to generate BLCRs. Objective: This study aims to develop a method to generate BLCRs for an MI-based smoking cessation chatbot and to measure the method?s effectiveness. Methods: LLMs such as GPT-4 can be stimulated to produce specific types of responses to their inputs by ?asking? them with an English-based description of the desired output. These descriptions are called prompts, and the goal of writing a description that causes an LLM to generate the required output is termed prompt engineering. We evolved an instruction to prompt GPT-4 to generate a BLCR, given the portions of the transcript of the conversation up to the point where the reflection was needed. The approach was tested on 50 previously collected MIBot transcripts of conversations with smokers and was used to generate a total of 150 reflections. The quality of the reflections was rated on a 4-point scale by 3 independent raters to determine whether they met specific criteria for acceptability. Results: Of the 150 generated reflections, 132 (88%) met the level of acceptability. The remaining 18 (12%) had one or more flaws that made them inappropriate as BLCRs. The 3 raters had pairwise agreement on 80% to 88% of these scores. Conclusions: The method presented to generate BLCRs is good enough to be used as one source of reflections in an MI-style conversation but would need an automatic checker to eliminate the unacceptable ones. This work illustrates the power of the new LLMs to generate therapeutic client-specific responses under the command of a language-based specification. UR - https://mental.jmir.org/2024/1/e53778 UR - http://dx.doi.org/10.2196/53778 ID - info:doi/10.2196/53778 ER - TY - JOUR AU - Shen, Jocelyn AU - DiPaola, Daniella AU - Ali, Safinah AU - Sap, Maarten AU - Park, Won Hae AU - Breazeal, Cynthia PY - 2024/9/25 TI - Empathy Toward Artificial Intelligence Versus Human Experiences and the Role of Transparency in Mental Health and Social Support Chatbot Design: Comparative Study JO - JMIR Ment Health SP - e62679 VL - 11 KW - empathy KW - large language models KW - ethics KW - transparency KW - crowdsourcing KW - human-computer interaction N2 - Background: Empathy is a driving force in our connection to others, our mental well-being, and resilience to challenges. With the rise of generative artificial intelligence (AI) systems, mental health chatbots, and AI social support companions, it is important to understand how empathy unfolds toward stories from human versus AI narrators and how transparency plays a role in user emotions. Objective: We aim to understand how empathy shifts across human-written versus AI-written stories, and how these findings inform ethical implications and human-centered design of using mental health chatbots as objects of empathy. Methods: We conducted crowd-sourced studies with 985 participants who each wrote a personal story and then rated empathy toward 2 retrieved stories, where one was written by a language model, and another was written by a human. Our studies varied disclosing whether a story was written by a human or an AI system to see how transparent author information affects empathy toward the narrator. We conducted mixed methods analyses: through statistical tests, we compared user?s self-reported state empathy toward the stories across different conditions. In addition, we qualitatively coded open-ended feedback about reactions to the stories to understand how and why transparency affects empathy toward human versus AI storytellers. Results: We found that participants significantly empathized with human-written over AI-written stories in almost all conditions, regardless of whether they are aware (t196=7.07, P<.001, Cohen d=0.60) or not aware (t298=3.46, P<.001, Cohen d=0.24) that an AI system wrote the story. We also found that participants reported greater willingness to empathize with AI-written stories when there was transparency about the story author (t494=?5.49, P<.001, Cohen d=0.36). Conclusions: Our work sheds light on how empathy toward AI or human narrators is tied to the way the text is presented, thus informing ethical considerations of empathetic artificial social support or mental health chatbots. UR - https://mental.jmir.org/2024/1/e62679 UR - http://dx.doi.org/10.2196/62679 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/62679 ER - TY - JOUR AU - MacNeill, Luke A. AU - MacNeill, Lillian AU - Luke, Alison AU - Doucet, Shelley PY - 2024/9/25 TI - Health Professionals? Views on the Use of Conversational Agents for Health Care: Qualitative Descriptive Study JO - J Med Internet Res SP - e49387 VL - 26 KW - conversational agents KW - chatbots KW - health care KW - health professionals KW - health personnel KW - qualitative KW - interview N2 - Background: In recent years, there has been an increase in the use of conversational agents for health promotion and service delivery. To date, health professionals? views on the use of this technology have received limited attention in the literature. Objective: The purpose of this study was to gain a better understanding of how health professionals view the use of conversational agents for health care. Methods: Physicians, nurses, and regulated mental health professionals were recruited using various web-based methods. Participants were interviewed individually using the Zoom (Zoom Video Communications, Inc) videoconferencing platform. Interview questions focused on the potential benefits and risks of using conversational agents for health care, as well as the best way to integrate conversational agents into the health care system. Interviews were transcribed verbatim and uploaded to NVivo (version 12; QSR International, Inc) for thematic analysis. Results: A total of 24 health professionals participated in the study (19 women, 5 men; mean age 42.75, SD 10.71 years). Participants said that the use of conversational agents for health care could have certain benefits, such as greater access to care for patients or clients and workload support for health professionals. They also discussed potential drawbacks, such as an added burden on health professionals (eg, program familiarization) and the limited capabilities of these programs. Participants said that conversational agents could be used for routine or basic tasks, such as screening and assessment, providing information and education, and supporting individuals between appointments. They also said that health professionals should have some oversight in terms of the development and implementation of these programs. Conclusions: The results of this study provide insight into health professionals? views on the use of conversational agents for health care, particularly in terms of the benefits and drawbacks of these programs and how they should be integrated into the health care system. These collective findings offer useful information and guidance to stakeholders who have an interest in the development and implementation of this technology. UR - https://www.jmir.org/2024/1/e49387 UR - http://dx.doi.org/10.2196/49387 UR - http://www.ncbi.nlm.nih.gov/pubmed/39320936 ID - info:doi/10.2196/49387 ER - TY - JOUR AU - Weng, Xue AU - Yin, Hua AU - Liu, Kefeng AU - Song, Chuyu AU - Xie, Jiali AU - Guo, Ningyuan AU - Wang, Ping Man PY - 2024/9/23 TI - Chatbot-Led Support Combined With Counselor-Led Support on Smoking Cessation in China: Protocol for a Pilot Randomized Controlled Trial JO - JMIR Res Protoc SP - e58636 VL - 13 KW - chatbot KW - smoking cessation KW - mHealth KW - mobile phone KW - campus KW - China N2 - Background: China has a large population of smokers, with half of them dependent on tobacco and in need of cessation assistance, indicating the need for mobile health (mHealth) to provide cessation support. Objective: The study aims to assess the feasibility and preliminary effectiveness of combining chatbot-led support with counselor-led support for smoking cessation among community smokers in China. Methods: This is a 2-arm, parallel, assessor-blinded, pilot randomized controlled trial nested in a smoke-free campus campaign in Zhuhai, China. All participants will receive brief face-to-face cessation advice and group cessation support led by a chatbot embedded in WeChat. In addition, participants in the intervention group will receive personalized WeChat-based counseling from trained counselors. Follow-up will occur at 1, 3, and 6?months after treatment initiation. The primary smoking outcome is bioverified abstinence (exhaled carbon monoxide <4 parts per million or salivary cotinine <30 ng/mL) at 6 months. Secondary outcomes include self-reported 7-day point prevalence of abstinence, smoking reduction rate, and quit attempts. Feasibility outcomes include eligibility rate, consent rate, intervention engagement, and retention rate. An intention-to-treat approach and regression models will be used for primary analyses. Results: Participant recruitment began in March 2023, and the intervention began in April 2023. The data collection was completed in June 2024. The results of the study will be published in peer-reviewed journals and presented at international conferences. Conclusions: This study will provide novel insights into the feasibility and preliminary effectiveness of a chatbot-led intervention for smoking cessation in China. The findings of this study will inform the development and optimization of mHealth interventions for smoking cessation in China and other low- and middle-income countries. Trial Registration: ClinicalTrials.gov NCT05777005; https://clinicaltrials.gov/study/NCT05777005 International Registered Report Identifier (IRRID): DERR1-10.2196/58636 UR - https://www.researchprotocols.org/2024/1/e58636 UR - http://dx.doi.org/10.2196/58636 UR - http://www.ncbi.nlm.nih.gov/pubmed/39312291 ID - info:doi/10.2196/58636 ER - TY - JOUR AU - Wei, Bin AU - Hu, Xin AU - Wu, XiaoRong PY - 2024/9/17 TI - Considerations and Challenges in the Application of Large Language Models for Patient Complaint Resolution JO - J Med Internet Res SP - e65527 VL - 26 KW - ChatGPT KW - large language model KW - LLM KW - artificial intelligence KW - AI KW - patient complaint KW - empathy KW - efficiency KW - patient satisfaction KW - resource allocation UR - https://www.jmir.org/2024/1/e65527 UR - http://dx.doi.org/10.2196/65527 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65527 ER - TY - JOUR AU - Kisvarday, Susannah AU - Yan, Adam AU - Yarahuan, Julia AU - Kats, J. Daniel AU - Ray, Mondira AU - Kim, Eugene AU - Hong, Peter AU - Spector, Jacob AU - Bickel, Jonathan AU - Parsons, Chase AU - Rabbani, Naveed AU - Hron, D. Jonathan PY - 2024/9/12 TI - ChatGPT Use Among Pediatric Health Care Providers: Cross-Sectional Survey Study JO - JMIR Form Res SP - e56797 VL - 8 KW - ChatGPT KW - machine learning KW - surveys and questionnaires KW - medical informatics applications KW - OpenAI KW - large language model KW - LLM KW - pediatric KW - chatbot KW - artificial intelligence KW - AI KW - digital tools N2 - Background: The public launch of OpenAI?s ChatGPT platform generated immediate interest in the use of large language models (LLMs). Health care institutions are now grappling with establishing policies and guidelines for the use of these technologies, yet little is known about how health care providers view LLMs in medical settings. Moreover, there are no studies assessing how pediatric providers are adopting these readily accessible tools. Objective: The aim of this study was to determine how pediatric providers are currently using LLMs in their work as well as their interest in using a Health Insurance Portability and Accountability Act (HIPAA)?compliant version of ChatGPT in the future. Methods: A survey instrument consisting of structured and unstructured questions was iteratively developed by a team of informaticians from various pediatric specialties. The survey was sent via Research Electronic Data Capture (REDCap) to all Boston Children?s Hospital pediatric providers. Participation was voluntary and uncompensated, and all survey responses were anonymous.  Results: Surveys were completed by 390 pediatric providers. Approximately 50% (197/390) of respondents had used an LLM; of these, almost 75% (142/197) were already using an LLM for nonclinical work and 27% (52/195) for clinical work. Providers detailed the various ways they are currently using an LLM in their clinical and nonclinical work. Only 29% (n=105) of 362 respondents indicated that ChatGPT should be used for patient care in its present state; however, 73.8% (273/368) reported they would use a HIPAA-compliant version of ChatGPT if one were available. Providers? proposed future uses of LLMs in health care are described. Conclusions: Despite significant concerns and barriers to LLM use in health care, pediatric providers are already using LLMs at work. This study will give policy makers needed information about how providers are using LLMs clinically. UR - https://formative.jmir.org/2024/1/e56797 UR - http://dx.doi.org/10.2196/56797 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56797 ER - TY - JOUR AU - Sanjeewa, Ruvini AU - Iyer, Ravi AU - Apputhurai, Pragalathan AU - Wickramasinghe, Nilmini AU - Meyer, Denny PY - 2024/9/9 TI - Empathic Conversational Agent Platform Designs and Their Evaluation in the Context of Mental Health: Systematic Review JO - JMIR Ment Health SP - e58974 VL - 11 KW - conversational agents KW - chatbots KW - virtual assistants KW - empathy KW - emotionally aware KW - mental health KW - mental well-being N2 - Background: The demand for mental health (MH) services in the community continues to exceed supply. At the same time, technological developments make the use of artificial intelligence?empowered conversational agents (CAs) a real possibility to help fill this gap. Objective: The objective of this review was to identify existing empathic CA design architectures within the MH care sector and to assess their technical performance in detecting and responding to user emotions in terms of classification accuracy. In addition, the approaches used to evaluate empathic CAs within the MH care sector in terms of their acceptability to users were considered. Finally, this review aimed to identify limitations and future directions for empathic CAs in MH care. Methods: A systematic literature search was conducted across 6 academic databases to identify journal articles and conference proceedings using search terms covering 3 topics: ?conversational agents,? ?mental health,? and ?empathy.? Only studies discussing CA interventions for the MH care domain were eligible for this review, with both textual and vocal characteristics considered as possible data inputs. Quality was assessed using appropriate risk of bias and quality tools. Results: A total of 19 articles met all inclusion criteria. Most (12/19, 63%) of these empathic CA designs in MH care were machine learning (ML) based, with 26% (5/19) hybrid engines and 11% (2/19) rule-based systems. Among the ML-based CAs, 47% (9/19) used neural networks, with transformer-based architectures being well represented (7/19, 37%). The remaining 16% (3/19) of the ML models were unspecified. Technical assessments of these CAs focused on response accuracies and their ability to recognize, predict, and classify user emotions. While single-engine CAs demonstrated good accuracy, the hybrid engines achieved higher accuracy and provided more nuanced responses. Of the 19 studies, human evaluations were conducted in 16 (84%), with only 5 (26%) focusing directly on the CA?s empathic features. All these papers used self-reports for measuring empathy, including single or multiple (scale) ratings or qualitative feedback from in-depth interviews. Only 1 (5%) paper included evaluations by both CA users and experts, adding more value to the process. Conclusions: The integration of CA design and its evaluation is crucial to produce empathic CAs. Future studies should focus on using a clear definition of empathy and standardized scales for empathy measurement, ideally including expert assessment. In addition, the diversity in measures used for technical assessment and evaluation poses a challenge for comparing CA performances, which future research should also address. However, CAs with good technical and empathic performance are already available to users of MH care services, showing promise for new applications, such as helpline services. UR - https://mental.jmir.org/2024/1/e58974 UR - http://dx.doi.org/10.2196/58974 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58974 ER - TY - JOUR AU - Quinn, Kelly AU - Leiser Ransom, Sarah AU - O'Connell, Carrie AU - Muramatsu, Naoko AU - Marquez, X. David AU - Chin, Jessie PY - 2024/8/30 TI - Assessing the Feasibility and Acceptability of Smart Speakers in Behavioral Intervention Research With Older Adults: Mixed Methods Study JO - J Med Internet Res SP - e54800 VL - 26 KW - smart speakers KW - physical activity KW - older adults KW - behavioral health KW - intervention KW - smart device KW - smart devices KW - conversational agent KW - physical activities KW - behavioral intervention KW - intervention research N2 - Background: Smart speakers, such as Amazon?s Echo and Google?s Nest Home, combine natural language processing with a conversational interface to carry out everyday tasks, like playing music and finding information. Easy to use, they are embraced by older adults, including those with limited physical function, vision, or computer literacy. While smart speakers are increasingly used for research purposes (eg, implementing interventions and automatically recording selected research data), information on the advantages and disadvantages of using these devices for studies related to health promotion programs is limited. Objective: This study evaluates the feasibility and acceptability of using smart speakers to deliver a physical activity (PA) program designed to help older adults enhance their physical well-being. Methods: Community-dwelling older adults (n=18) were asked to use a custom smart speaker app to participate in an evidence-based, low-impact PA program for 10 weeks. Collected data, including measures of technology acceptance, interviews, field notes, and device logs, were analyzed using a concurrent mixed analysis approach. Technology acceptance measures were evaluated using time series ANOVAs to examine acceptability, appropriateness, feasibility, and intention to adopt smart speaker technology. Device logs provided evidence of interaction with and adoption of the device and the intervention. Interviews and field notes were thematically coded to triangulate the quantitative measures and further expand on factors relating to intervention fidelity. Results: Smart speakers were found to be acceptable for administering a PA program, as participants reported that the devices were highly usable (mean 5.02, SE 0.38) and had strong intentions to continue their use (mean 5.90, SE 0.39). Factors such as the voice-user interface and engagement with the device on everyday tasks were identified as meaningful to acceptability. The feasibility of the devices for research activity, however, was mixed. Despite the participants rating the smart speakers as easy to use (mean 5.55, SE 1.16), functional and technical factors, such as Wi-Fi connectivity and appropriate command phrasing, required the provision of additional support resources to participants and potentially impaired intervention fidelity. Conclusions: Smart speakers present an acceptable and appropriate behavioral intervention technology for PA programs directed at older adults but entail additional requirements for resource planning, technical support, and troubleshooting to ensure their feasibility for the research context and for fidelity of the intervention. UR - https://www.jmir.org/2024/1/e54800 UR - http://dx.doi.org/10.2196/54800 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54800 ER - TY - JOUR AU - Hindelang, Michael AU - Sitaru, Sebastian AU - Zink, Alexander PY - 2024/8/29 TI - Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review JO - JMIR Med Inform SP - e56628 VL - 12 KW - medical history-taking KW - chatbots KW - artificial intelligence KW - natural language processing KW - health care data collection KW - patient engagement KW - clinical decision-making KW - usability KW - acceptability KW - systematic review KW - diagnostic accuracy KW - patient-doctor communication KW - cybersecurity KW - machine learning KW - conversational agents KW - health informatics N2 - Background: The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence?driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice. Objective: This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history?taking. It also examines potential challenges and future opportunities for integration into clinical practice. Methods: A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history?taking. Interventions focused on chatbots designed to facilitate medical history?taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history?taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included ?chatbot*,? ?conversational agent*,? ?virtual assistant,? ?artificial intelligence chatbot,? ?medical history,? and ?history-taking.? The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs). Results: The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk. Conclusions: This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history?taking. The included studies showed that chatbots can increase patient engagement, streamline data collection, and improve health care decision-making. For effective integration into clinical practice, it is crucial to design user-friendly interfaces, ensure robust data security, and maintain empathetic patient-physician interactions. Future research should focus on refining chatbot algorithms, improving their emotional intelligence, and extending their application to different health care settings to realize their full potential in modern medicine. Trial Registration: PROSPERO CRD42023410312; www.crd.york.ac.uk/prospero UR - https://medinform.jmir.org/2024/1/e56628 UR - http://dx.doi.org/10.2196/56628 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56628 ER - TY - JOUR AU - Xu, Tianhui AU - Weng, Huiting AU - Liu, Fang AU - Yang, Li AU - Luo, Yuanyuan AU - Ding, Ziwei AU - Wang, Qin PY - 2024/8/28 TI - Current Status of ChatGPT Use in Medical Education: Potentials, Challenges, and Strategies JO - J Med Internet Res SP - e57896 VL - 26 KW - chat generative pretrained transformer KW - ChatGPT KW - artificial intelligence KW - medical education KW - natural language processing KW - clinical practice UR - https://www.jmir.org/2024/1/e57896 UR - http://dx.doi.org/10.2196/57896 UR - http://www.ncbi.nlm.nih.gov/pubmed/39196640 ID - info:doi/10.2196/57896 ER - TY - JOUR AU - Suffoletto, Brian PY - 2024/8/27 TI - Deceptively Simple yet Profoundly Impactful: Text Messaging Interventions to Support Health JO - J Med Internet Res SP - e58726 VL - 26 KW - SMS intervention KW - behavior KW - intervention KW - review KW - text messaging KW - SMS KW - interventions KW - behaviors KW - behaviour KW - behaviours KW - effectiveness KW - development KW - impact KW - narrative review KW - physical activity KW - diet KW - weight loss KW - mental health KW - substance use KW - meta-analysis KW - chatbot KW - chatbots KW - large language model KW - LLM KW - large language models KW - mobile phone UR - https://www.jmir.org/2024/1/e58726 UR - http://dx.doi.org/10.2196/58726 UR - http://www.ncbi.nlm.nih.gov/pubmed/39190427 ID - info:doi/10.2196/58726 ER - TY - JOUR AU - Tan, Kian Cheng AU - Lou, Q. Vivian W. AU - Cheng, Man Clio Yuen AU - He, Chu Phoebe AU - Khoo, Joo Veronica Eng PY - 2024/8/23 TI - Improving the Social Well-Being of Single Older Adults Using the LOVOT Social Robot: Qualitative Phenomenological Study JO - JMIR Hum Factors SP - e56669 VL - 11 KW - companionship KW - older adults KW - social well-being KW - pets KW - social robots KW - elderly KW - wellbeing KW - qualitative research KW - robot KW - companion KW - body temperature KW - development KW - research design KW - design KW - interviews KW - psychosocial support KW - support KW - psychosocial KW - temperature regulation KW - social KW - care home KW - aging KW - ageing KW - robotics KW - well-being KW - loneliness KW - technology KW - mobile phone N2 - Background: This study examined the social well-being of single older adults through the companionship of a social robot, LOVOT (Love+Robot; Groove X). It is designed as a companion for older adults, providing love and affection through verbal and physical interaction. We investigated older adults? perceptions of the technology and how they benefitted from interacting with LOVOT, to guide the future development of social robots. Objective: This study aimed to use a phenomenological research design to understand the participants? experiences of companionship provided by the social robot. Our research focused on (1) examining the social well-being of single older adults through the companionship of social robots and (2) understanding the perceptions of single older adults when interacting with social robots. Given the prevalence of technology use to support aging, understanding single older adults? social well-being and their perceptions of social robots is essential to guide future research on and design of social robots. Methods: A total of 5 single women, aged 60 to 75 years, participated in the study. The participants interacted independently with the robot for a week in their own homes and then participated in a poststudy interview to share their experiences. Results: In total, 4 main themes emerged from the participants? interactions with LOVOT, such as caring for a social robot, comforting presence of the social robot, meaningful connections with the social robot, and preference for LOVOT over pets. Conclusions: The results indicate that single older adults can obtain psychosocial support by interacting with LOVOT. LOVOT is easily accepted as a companion and makes single older adults feel like they have a greater sense of purpose and someone to connect with. This study suggests that social robots can provide companionship to older adults who live alone. Social robots can help alleviate loneliness by allowing single older adults to form social connections with robots as companions. These findings are particularly important given the rapid aging of the population and the increasing number of single-person households in Singapore. UR - https://humanfactors.jmir.org/2024/1/e56669 UR - http://dx.doi.org/10.2196/56669 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56669 ER - TY - JOUR AU - Thomae, V. Anita AU - Witt, M. Claudia AU - Barth, Jürgen PY - 2024/8/22 TI - Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students? Perception, and Applications JO - JMIR Med Educ SP - e50545 VL - 10 KW - medical education KW - ChatGPT KW - artificial intelligence KW - information for patients KW - critical appraisal KW - evaluation KW - blended learning KW - AI KW - digital skills KW - teaching N2 - Background: Text-generating artificial intelligence (AI) such as ChatGPT offers many opportunities and challenges in medical education. Acquiring practical skills necessary for using AI in a clinical context is crucial, especially for medical education. Objective: This explorative study aimed to investigate the feasibility of integrating ChatGPT into teaching units and to evaluate the course and the importance of AI-related competencies for medical students. Since a possible application of ChatGPT in the medical field could be the generation of information for patients, we further investigated how such information is perceived by students in terms of persuasiveness and quality. Methods: ChatGPT was integrated into 3 different teaching units of a blended learning course for medical students. Using a mixed methods approach, quantitative and qualitative data were collected. As baseline data, we assessed students? characteristics, including their openness to digital innovation. The students evaluated the integration of ChatGPT into the course and shared their thoughts regarding the future of text-generating AI in medical education. The course was evaluated based on the Kirkpatrick Model, with satisfaction, learning progress, and applicable knowledge considered as key assessment levels. In ChatGPT-integrating teaching units, students evaluated videos featuring information for patients regarding their persuasiveness on treatment expectations in a self-experience experiment and critically reviewed information for patients written using ChatGPT 3.5 based on different prompts. Results: A total of 52 medical students participated in the study. The comprehensive evaluation of the course revealed elevated levels of satisfaction, learning progress, and applicability specifically in relation to the ChatGPT-integrating teaching units. Furthermore, all evaluation levels demonstrated an association with each other. Higher openness to digital innovation was associated with higher satisfaction and, to a lesser extent, with higher applicability. AI-related competencies in other courses of the medical curriculum were perceived as highly important by medical students. Qualitative analysis highlighted potential use cases of ChatGPT in teaching and learning. In ChatGPT-integrating teaching units, students rated information for patients generated using a basic ChatGPT prompt as ?moderate? in terms of comprehensibility, patient safety, and the correct application of communication rules taught during the course. The students? ratings were considerably improved using an extended prompt. The same text, however, showed the smallest increase in treatment expectations when compared with information provided by humans (patient, clinician, and expert) via videos. Conclusions: This study offers valuable insights into integrating the development of AI competencies into a blended learning course. Integration of ChatGPT enhanced learning experiences for medical students. UR - https://mededu.jmir.org/2024/1/e50545 UR - http://dx.doi.org/10.2196/50545 ID - info:doi/10.2196/50545 ER - TY - JOUR AU - Bendotti, Hollie AU - Lawler, Sheleigh AU - Ireland, David AU - Gartner, Coral AU - Marshall, M. Henry PY - 2024/8/19 TI - Co-Designing a Smoking Cessation Chatbot: Focus Group Study of End Users and Smoking Cessation Professionals JO - JMIR Hum Factors SP - e56505 VL - 11 KW - artificial intelligence KW - chatbot KW - smoking cessation KW - behavior change KW - smoking KW - mobile health KW - apps KW - digital interventions KW - smartphone KW - mobile phone N2 - Background: Our prototype smoking cessation chatbot, Quin, provides evidence-based, personalized support delivered via a smartphone app to help people quit smoking. We developed Quin using a multiphase program of co-design research, part of which included focus group evaluation of Quin among stakeholders prior to clinical testing. Objective: This study aimed to gather and compare feedback on the user experience of the Quin prototype from end users and smoking cessation professionals (SCPs) via a beta testing process to inform ongoing chatbot iterations and refinements. Methods: Following active and passive recruitment, we conducted web-based focus groups with SCPs and end users from Queensland, Australia. Participants tested the app for 1-2 weeks prior to focus group discussion and could also log conversation feedback within the app. Focus groups of SCPs were completed first to review the breadth and accuracy of information, and feedback was prioritized and implemented as major updates using Agile processes prior to end user focus groups. We categorized logged in-app feedback using content analysis and thematically analyzed focus group transcripts. Results: In total, 6 focus groups were completed between August 2022 and June 2023; 3 for SCPs (n=9 participants) and 3 for end users (n=7 participants). Four SCPs had previously smoked, and most end users currently smoked cigarettes (n=5), and 2 had quit smoking. The mean duration of focus groups was 58 (SD 10.9; range 46-74) minutes. We identified four major themes from focus group feedback: (1) conversation design, (2) functionality, (3) relationality and anthropomorphism, and (4) role as a smoking cessation support tool. In response to SCPs? feedback, we made two major updates to Quin between cohorts: (1) improvements to conversation flow and (2) addition of the ?Moments of Crisis? conversation tree. Participant feedback also informed 17 recommendations for future smoking cessation chatbot developments. Conclusions: Feedback from end users and SCPs highlighted the importance of chatbot functionality, as this underpinned Quin?s conversation design and relationality. The ready accessibility of accurate cessation information and impartial support that Quin provided was recognized as a key benefit for end users, the latter of which contributed to a feeling of accountability to the chatbot. Findings will inform the ongoing development of a mature prototype for clinical testing. UR - https://humanfactors.jmir.org/2024/1/e56505 UR - http://dx.doi.org/10.2196/56505 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56505 ER - TY - JOUR AU - Matsui, Kentaro AU - Utsumi, Tomohiro AU - Aoki, Yumi AU - Maruki, Taku AU - Takeshima, Masahiro AU - Takaesu, Yoshikazu PY - 2024/8/16 TI - Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews JO - J Med Internet Res SP - e52758 VL - 26 KW - systematic review KW - screening KW - GPT-3.5 KW - GPT-4 KW - language model KW - information science KW - library science KW - artificial intelligence KW - prompt engineering KW - meta-analysis N2 - Background: The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers. Objective: We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records. Methods: We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study?s inclusion criteria and optimization for screening were carried out using a GPT-4?based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included. Results: On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria. Conclusions: Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility. UR - https://www.jmir.org/2024/1/e52758 UR - http://dx.doi.org/10.2196/52758 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52758 ER - TY - JOUR AU - Holderried, Friederike AU - Stegemann-Philipps, Christian AU - Herrmann-Werner, Anne AU - Festl-Wietek, Teresa AU - Holderried, Martin AU - Eickhoff, Carsten AU - Mahling, Moritz PY - 2024/8/16 TI - A Language Model?Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study JO - JMIR Med Educ SP - e59213 VL - 10 KW - virtual patients communication KW - communication skills KW - technology enhanced education KW - TEL KW - medical education KW - ChatGPT KW - GPT: LLM KW - LLMs KW - NLP KW - natural language processing KW - machine learning KW - artificial intelligence KW - language model KW - language models KW - communication KW - relationship KW - relationships KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - history KW - histories KW - simulated KW - student KW - students KW - interaction KW - interactions N2 - Background: Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback on the skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged as educational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing their realism and potential to provide feedback. Objective: In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model to provide structured feedback on medical students? performance in history taking with a simulated patient. Methods: We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. To that end, we designed a chatbot to simulate patients? responses and provide immediate feedback on the comprehensiveness of the students? history taking. Students? interactions with the chatbot were analyzed, and feedback from the chatbot was compared with feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the quality of feedback. Results: Most of the study?s participants were in their third year of medical school. A total of 1894 question-answer pairs from 106 conversations were included in our analysis. GPT-4?s role-play and responses were medically plausible in more than 99% of cases. Interrater reliability between GPT-4 and the human rater showed ?almost perfect? agreement (Cohen ?=0.832). Less agreement (?<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model?s assessments were overly specific or diverged from human judgement. Conclusions: The GPT model was effective in providing structured feedback on history-taking dialogs provided by medical students. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocate the careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs are used in that context. UR - https://mededu.jmir.org/2024/1/e59213 UR - http://dx.doi.org/10.2196/59213 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59213 ER - TY - JOUR AU - Wu, Gloria AU - Lee, A. David AU - Zhao, Weichen AU - Wong, Adrial AU - Jhangiani, Rohan AU - Kurniawan, Sri PY - 2024/8/15 TI - ChatGPT and Google Assistant as a Source of Patient Education for Patients With Amblyopia: Content Analysis JO - J Med Internet Res SP - e52401 VL - 26 KW - ChatGPT KW - Google Assistant KW - amblyopia KW - health literacy KW - American Association for Pediatric Ophthalmology and Strabismus KW - pediatric KW - ophthalmology KW - patient education KW - education KW - ophthalmologist KW - Google KW - monitoring N2 - Background: We queried ChatGPT (OpenAI) and Google Assistant about amblyopia and compared their answers with the keywords found on the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) website, specifically the section on amblyopia. Out of the 26 keywords chosen from the website, ChatGPT included 11 (42%) in its responses, while Google included 8 (31%). Objective: Our study investigated the adherence of ChatGPT-3.5 and Google Assistant to the guidelines of the AAPOS for patient education on amblyopia. Methods: ChatGPT-3.5 was used. The four questions taken from the AAPOS website, specifically its glossary section for amblyopia, are as follows: (1) What is amblyopia? (2) What causes amblyopia? (3) How is amblyopia treated? (4) What happens if amblyopia is untreated? Approved and selected by ophthalmologists (GW and DL), the keywords from AAPOS were words or phrases that deemed significant for the education of patients with amblyopia. The ?Flesch-Kincaid Grade Level? formula, approved by the US Department of Education, was used to evaluate the reading comprehension level for the responses from ChatGPT, Google Assistant, and AAPOS. Results: In their responses, ChatGPT did not mention the term ?ophthalmologist,? whereas Google Assistant and AAPOS both mentioned the term once and twice, respectively. ChatGPT did, however, use the term ?eye doctors? once. According to the Flesch-Kincaid test, the average reading level of AAPOS was 11.4 (SD 2.1; the lowest level) while that of Google was 13.1 (SD 4.8; the highest required reading level), also showing the greatest variation in grade level in its responses. ChatGPT?s answers, on average, scored 12.4 (SD 1.1) grade level. They were all similar in terms of difficulty level in reading. For the keywords, out of the 4 responses, ChatGPT used 42% (11/26) of the keywords, whereas Google Assistant used 31% (8/26). Conclusions: ChatGPT trains on texts and phrases and generates new sentences, while Google Assistant automatically copies website links. As ophthalmologists, we should consider including ?see an ophthalmologist? on our websites and journals. While ChatGPT is here to stay, we, as physicians, need to monitor its answers. UR - https://www.jmir.org/2024/1/e52401 UR - http://dx.doi.org/10.2196/52401 UR - http://www.ncbi.nlm.nih.gov/pubmed/39146013 ID - info:doi/10.2196/52401 ER - TY - JOUR AU - Gawey, Lauren AU - Dagenet, B. Caitlyn AU - Tran, A. Khiem AU - Park, Sarah AU - Hsiao, L. Jennifer AU - Shi, Vivian PY - 2024/8/14 TI - Readability of Information Generated by ChatGPT for Hidradenitis Suppurativa JO - JMIR Dermatol SP - e55204 VL - 7 KW - hidradenitis suppurativa KW - ChatGPT KW - Chat-GPT KW - chatbot KW - chatbots KW - chat-bot KW - chat-bots KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - deep learning KW - patient resources KW - readability UR - https://derma.jmir.org/2024/1/e55204 UR - http://dx.doi.org/10.2196/55204 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55204 ER - TY - JOUR AU - Ayo-Ajibola, Oluwatobiloba AU - Davis, J. Ryan AU - Lin, E. Matthew AU - Riddell, Jeffrey AU - Kravitz, L. Richard PY - 2024/8/14 TI - Characterizing the Adoption and Experiences of Users of Artificial Intelligence?Generated Health Information in the United States: Cross-Sectional Questionnaire Study JO - J Med Internet Res SP - e55138 VL - 26 KW - artificial intelligence KW - ChatGPT KW - health information KW - patient information-seeking KW - online health information KW - health literacy KW - ResearchMatch KW - users KW - diagnosis KW - decision-making KW - cross-sectional KW - survey KW - surveys KW - adoption KW - utilization KW - AI KW - less-educated KW - poor health KW - worse health KW - experience KW - experiences KW - user KW - non user KW - non users KW - AI-generated KW - implication KW - implications KW - medical practice KW - medical practices KW - public health KW - descriptive statistics KW - t test KW - t tests KW - chi-square test KW - chi-square tests KW - health-seeking behavior KW - health-seeking behaviors KW - patient-provider KW - interaction KW - interactions KW - patient KW - patients N2 - Background: OpenAI?s ChatGPT is a source of advanced online health information (OHI) that may be integrated into individuals? health information-seeking routines. However, concerns have been raised about its factual accuracy and impact on health outcomes. To forecast implications for medical practice and public health, more information is needed on who uses the tool, how often, and for what. Objective: This study aims to characterize the reasons for and types of ChatGPT OHI use and describe the users most likely to engage with the platform. Methods: In this cross-sectional survey, patients received invitations to participate via the ResearchMatch platform, a nonprofit affiliate of the National Institutes of Health. A web-based survey measured demographic characteristics, use of ChatGPT and other sources of OHI, experience characterization, and resultant health behaviors. Descriptive statistics were used to summarize the data. Both 2-tailed t tests and Pearson chi-square tests were used to compare users of ChatGPT OHI to nonusers. Results: Of 2406 respondents, 21.5% (n=517) respondents reported using ChatGPT for OHI. ChatGPT users were younger than nonusers (32.8 vs 39.1 years, P<.001) with lower advanced degree attainment (BA or higher; 49.9% vs 67%, P<.001) and greater use of transient health care (ED and urgent care; P<.001). ChatGPT users were more avid consumers of general non-ChatGPT OHI (percentage of weekly or greater OHI seeking frequency in past 6 months, 28.2% vs 22.8%, P<.001). Around 39.3% (n=206) respondents endorsed using the platform for OHI 2-3 times weekly or more, and most sought the tool to determine if a consultation was required (47.4%, n=245) or to explore alternative treatment (46.2%, n=239). Use characterization was favorable as many believed ChatGPT to be just as or more useful than other OHIs (87.7%, n=429) and their doctor (81%, n=407). About one-third of respondents requested a referral (35.6%, n=184) or changed medications (31%, n=160) based on the information received from ChatGPT. As many users reported skepticism regarding the ChatGPT output (67.9%, n=336), most turned to their physicians (67.5%, n=349). Conclusions: This study underscores the significant role of AI-generated OHI in shaping health-seeking behaviors and the potential evolution of patient-provider interactions. Given the proclivity of these users to enact health behavior changes based on AI-generated content, there is an opportunity for physicians to guide ChatGPT OHI users on an informed and examined use of the technology. UR - https://www.jmir.org/2024/1/e55138 UR - http://dx.doi.org/10.2196/55138 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55138 ER - TY - JOUR AU - Mancinelli, Elisa AU - Magnolini, Simone AU - Gabrielli, Silvia AU - Salcuni, Silvia PY - 2024/8/14 TI - A Chatbot (Juno) Prototype to Deploy a Behavioral Activation Intervention to Pregnant Women: Qualitative Evaluation Using a Multiple Case Study JO - JMIR Form Res SP - e58653 VL - 8 KW - chatbot prototype KW - co-design KW - pregnancy KW - prevention KW - behavioral activation KW - multiple case study N2 - Background: Despite the increasing focus on perinatal care, preventive digital interventions are still scarce. Furthermore, the literature suggests that the design and development of these interventions are mainly conducted through a top-down approach that limitedly accounts for direct end user perspectives. Objective: Building from a previous co-design study, this study aimed to qualitatively evaluate pregnant women?s experiences with a chatbot (Juno) prototype designed to deploy a preventive behavioral activation intervention. Methods: Using a multiple?case study design, the research aims to uncover similarities and differences in participants? perceptions of the chatbot while also exploring women?s desires for improvement and technological advancements in chatbot-based interventions in perinatal mental health. Five pregnant women interacted weekly with the chatbot, operationalized in Telegram, following a 6-week intervention. Self-report questionnaires were administered at baseline and postintervention time points. About 10-14 days after concluding interactions with Juno, women participated in a semistructured interview focused on (1) their personal experience with Juno, (2) user experience and user engagement, and (3) their opinions on future technological advancements. Interview transcripts, comprising 15 questions, were qualitatively evaluated and compared. Finally, a text-mining analysis of transcripts was performed. Results: Similarities and differences have emerged regarding women?s experiences with Juno, appreciating its esthetic but highlighting technical issues and desiring clearer guidance. They found the content useful and pertinent to pregnancy but differed on when they deemed it most helpful. Women expressed interest in receiving increasingly personalized responses and in future integration with existing health care systems for better support. Accordingly, they generally viewed Juno as an effective momentary support but emphasized the need for human interaction in mental health care, particularly if increasingly personalized. Further concerns included overreliance on chatbots when seeking psychological support and the importance of clearly educating users on the chatbot?s limitations. Conclusions: Overall, the results highlighted both the positive aspects and the shortcomings of the chatbot-based intervention, providing insight into its refinement and future developments. However, women stressed the need to balance technological support with human interactions, particularly when the intervention involves beyond preventive mental health context, to favor a greater and more reliable monitoring. UR - https://formative.jmir.org/2024/1/e58653 UR - http://dx.doi.org/10.2196/58653 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58653 ER - TY - JOUR AU - Ming, Shuai AU - Guo, Qingge AU - Cheng, Wenjun AU - Lei, Bo PY - 2024/8/13 TI - Influence of Model Evolution and System Roles on ChatGPT?s Performance in Chinese Medical Licensing Exams: Comparative Study JO - JMIR Med Educ SP - e52784 VL - 10 KW - ChatGPT KW - Chinese National Medical Licensing Examination KW - large language models KW - medical education KW - system role KW - LLM KW - LLMs KW - language model KW - language models KW - artificial intelligence KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - exam KW - exams KW - examination KW - examinations KW - OpenAI KW - answer KW - answers KW - response KW - responses KW - accuracy KW - performance KW - China KW - Chinese N2 - Background: With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. Objective: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). Methods: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt?s designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The ?2 tests and ? values were employed to evaluate the model?s accuracy and consistency. Results: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with ? values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%?3.7%) and GPT-3.5 (1.3%?4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response. Conclusions: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model?s reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study. UR - https://mededu.jmir.org/2024/1/e52784 UR - http://dx.doi.org/10.2196/52784 ID - info:doi/10.2196/52784 ER - TY - JOUR AU - Cherrez-Ojeda, Ivan AU - Gallardo-Bastidas, C. Juan AU - Robles-Velasco, Karla AU - Osorio, F. María AU - Velez Leon, Maria Eleonor AU - Leon Velastegui, Manuel AU - Pauletto, Patrícia AU - Aguilar-Díaz, C. F. AU - Squassi, Aldo AU - González Eras, Patricia Susana AU - Cordero Carrasco, Erita AU - Chavez Gonzalez, Leonor Karol AU - Calderon, C. Juan AU - Bousquet, Jean AU - Bedbrook, Anna AU - Faytong-Haro, Marco PY - 2024/8/13 TI - Understanding Health Care Students? Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study JO - JMIR Med Educ SP - e51757 VL - 10 KW - artificial intelligence KW - ChatGPT KW - education KW - health care KW - students N2 - Background: ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area. Objective: The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants? attitudes toward the use of ChatGPT. Methods: A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses. Results: Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was ?minimal? (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) ?somewhat agreed? that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results). Conclusions: Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs. UR - https://mededu.jmir.org/2024/1/e51757 UR - http://dx.doi.org/10.2196/51757 UR - http://www.ncbi.nlm.nih.gov/pubmed/39137029 ID - info:doi/10.2196/51757 ER - TY - JOUR AU - Xian, Xuechang AU - Chang, Angela AU - Xiang, Yu-Tao AU - Liu, Tingchi Matthew PY - 2024/8/12 TI - Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review JO - Interact J Med Res SP - e53672 VL - 13 KW - generative artificial intelligence KW - GAI KW - ChatGPT KW - mental health KW - scoping review KW - artificial intelligence KW - depression KW - anxiety KW - generative adversarial network KW - GAN KW - variational autoencoder KW - VAE N2 - Background: Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain. Objective: This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature. Methods: Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques). Results: In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care. Conclusions: This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices. UR - https://www.i-jmr.org/2024/1/e53672 UR - http://dx.doi.org/10.2196/53672 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53672 ER - TY - JOUR AU - Yong, Xian Lorraine Pei AU - Tung, Min Joshua Yi AU - Lee, Yao Zi AU - Kuan, Sen Win AU - Chua, Teng Mui PY - 2024/8/9 TI - Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey JO - J Med Internet Res SP - e56413 VL - 26 KW - ChatGPT KW - large language models KW - artificial intelligence KW - patient complaint KW - health care complaint KW - empathy KW - efficiency KW - patient satisfaction KW - resource allocation N2 - Background: Patient complaints are a perennial challenge faced by health care institutions globally, requiring extensive time and effort from health care workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the use of large language models (LLMs) such as the GPT models developed by OpenAI in the health care sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be used in responding to patient complaints to improve patient satisfaction and complaint response time. Objective: This study aims to evaluate the performance of LLMs in addressing patient complaints received by a tertiary health care institution, with the goal of enhancing patient satisfaction. Methods: Anonymized patient complaint emails and associated responses from the patient relations department were obtained. ChatGPT-4.0 (OpenAI, Inc) was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario. Results: There was a total of 188 respondents, of which 115 (61.2%) were health care workers. A majority of the respondents, including both health care and non?health care workers, preferred replies from ChatGPT (n=164, 87.2% to n=183, 97.3%). GPT-4.0 responses were rated higher in all 4 assessed items with all median scores of 8 (IQR 7-9) compared to human responses (appropriateness 5, IQR 3-7; empathy 4, IQR 3-6; quality 5, IQR 3-6; satisfaction 5, IQR 3-6; P<.001) and had higher average word counts as compared to human responses (238 vs 76 words). Regression analyses showed that a higher word count was a statistically significant predictor of higher score in all 4 items, with every 1-word increment resulting in an increase in scores of between 0.015 and 0.019 (all P<.001). However, on subgroup analysis by authorship, this only held true for responses written by patient relations department staff and not those generated by ChatGPT which received consistently high scores irrespective of response length. Conclusions: This study provides significant evidence supporting the effectiveness of LLMs in resolution of patient complaints. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence generated responses can bring in terms of time savings, cost-effectiveness, patient satisfaction, and stress reduction for the health care system. UR - https://www.jmir.org/2024/1/e56413 UR - http://dx.doi.org/10.2196/56413 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56413 ER - TY - JOUR AU - Wang, Yijie AU - Chen, Yining AU - Sheng, Jifang PY - 2024/8/8 TI - Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese JO - JMIR Med Inform SP - e56426 VL - 12 KW - chronic hepatitis B KW - artificial intelligence KW - large language models KW - chatbots KW - medical consultation KW - AI in health care KW - cross-linguistic study N2 - Background: Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5?s role is examined in managing CHB, particularly in regions with distinct health care landscapes. Objective: This study aimed to uncover insights into ChatGPT-3.5?s potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts. Methods: Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0. Results: Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5?s accuracy rate of 65.0% (117/180) (P<.001). Conclusions: In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients? experiences in practical applications. UR - https://medinform.jmir.org/2024/1/e56426 UR - http://dx.doi.org/10.2196/56426 UR - http://www.ncbi.nlm.nih.gov/pubmed/39115930 ID - info:doi/10.2196/56426 ER - TY - JOUR AU - Chan, Sze Wai AU - Cheng, Yee Wing AU - Lok, Chun Samson Hoi AU - Cheah, Mun Amanda Kah AU - Lee, Win Anna Kai AU - Ng, Ying Albe Sin AU - Kowatsch, Tobias PY - 2024/8/7 TI - Assessing the Short-Term Efficacy of Digital Cognitive Behavioral Therapy for Insomnia With Different Types of Coaching: Randomized Controlled Comparative Trial JO - JMIR Ment Health SP - e51716 VL - 11 KW - insomnia KW - cognitive behavioral therapy KW - digital intervention KW - mobile health KW - mHealth KW - chatbot-based coaching KW - human support KW - mobile phone N2 - Background: Digital cognitive behavioral therapy for insomnia (dCBTi) is an effective intervention for treating insomnia. The findings regarding its efficacy compared to face-to-face cognitive behavioral therapy for insomnia are inconclusive but suggest that dCBTi might be inferior. The lack of human support and low treatment adherence are believed to be barriers to dCBTi achieving its optimal efficacy. However, there has yet to be a direct comparative trial of dCBTi with different types of coaching support. Objective: This study examines whether adding chatbot-based and human coaching would improve the treatment efficacy of, and adherence to, dCBTi. Methods: Overall, 129 participants (n=98, 76% women; age: mean 34.09, SD 12.05 y) whose scores on the Insomnia Severity Index [ISI] were greater than 9 were recruited. A randomized controlled comparative trial with 5 arms was conducted: dCBTi with chatbot-based coaching and therapist support (dCBTi-therapist), dCBTi with chatbot-based coaching and research assistant support, dCBTi with chatbot-based coaching only, dCBTi without any coaching, and digital sleep hygiene and self-monitoring control. Participants were blinded to the condition assignment and study hypotheses, and the outcomes were self-assessed using questionnaires administered on the web. The outcomes included measures of insomnia (the ISI and the Sleep Condition Indicator), mood disturbances, fatigue, daytime sleepiness, quality of life, dysfunctional beliefs about sleep, and sleep-related safety behaviors administered at baseline, after treatment, and at 4-week follow-up. Treatment adherence was measured by the completion of video sessions and sleep diaries. An intention-to-treat analysis was conducted. Results: Significant condition-by-time interaction effects showed that dCBTi recipients, regardless of having any coaching, had greater improvements in insomnia measured by the Sleep Condition Indicator (P=.003; d=0.45) but not the ISI (P=.86; d=?0.28), depressive symptoms (P<.001; d=?0.62), anxiety (P=.01; d=?0.40), fatigue (P=.02; d=?0.35), dysfunctional beliefs about sleep (P<.001; d=?0.53), and safety behaviors related to sleep (P=.001; d=?0.50) than those who received digital sleep hygiene and self-monitoring control. The addition of chatbot-based coaching and human support did not improve treatment efficacy. However, adding human support promoted greater reductions in fatigue (P=.03; d=?0.33) and sleep-related safety behaviors (P=.05; d=?0.30) than dCBTi with chatbot-based coaching only at 4-week follow-up. dCBTi-therapist had the highest video and diary completion rates compared to other conditions (video: 16/25, 60% in dCBTi-therapist vs <3/21, <25% in dCBTi without any coaching), indicating greater treatment adherence. Conclusions: Our findings support the efficacy of dCBTi in treating insomnia, reducing thoughts and behaviors that perpetuate insomnia, reducing mood disturbances and fatigue, and improving quality of life. Adding chatbot-based coaching and human support did not significantly improve the efficacy of dCBTi after treatment. However, adding human support had incremental benefits on reducing fatigue and behaviors that could perpetuate insomnia, and hence may improve long-term efficacy. Trial Registration: ClinicalTrials.gov NCT05136638; https://www.clinicaltrials.gov/study/NCT05136638 UR - https://mental.jmir.org/2024/1/e51716 UR - http://dx.doi.org/10.2196/51716 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/51716 ER - TY - JOUR AU - McBee, C. Joseph AU - Han, Y. Daniel AU - Liu, Li AU - Ma, Leah AU - Adjeroh, A. Donald AU - Xu, Dong AU - Hu, Gangqing PY - 2024/8/7 TI - Assessing ChatGPT?s Competency in Addressing Interdisciplinary Inquiries on Chatbot Uses in Sports Rehabilitation: Simulation Study JO - JMIR Med Educ SP - e51157 VL - 10 KW - ChatGPT KW - chatbots KW - multirole-playing KW - interdisciplinary inquiry KW - medical education KW - sports medicine N2 - Background: ChatGPT showcases exceptional conversational capabilities and extensive cross-disciplinary knowledge. In addition, it can perform multiple roles in a single chat session. This unique multirole-playing feature positions ChatGPT as a promising tool for exploring interdisciplinary subjects. Objective: The aim of this study was to evaluate ChatGPT?s competency in addressing interdisciplinary inquiries based on a case study exploring the opportunities and challenges of chatbot uses in sports rehabilitation. Methods: We developed a model termed PanelGPT to assess ChatGPT?s competency in addressing interdisciplinary topics through simulated panel discussions. Taking chatbot uses in sports rehabilitation as an example of an interdisciplinary topic, we prompted ChatGPT through PanelGPT to role-play a physiotherapist, psychologist, nutritionist, artificial intelligence expert, and athlete in a simulated panel discussion. During the simulation, we posed questions to the panel while ChatGPT acted as both the panelists for responses and the moderator for steering the discussion. We performed the simulation using ChatGPT-4 and evaluated the responses by referring to the literature and our human expertise. Results: By tackling questions related to chatbot uses in sports rehabilitation with respect to patient education, physiotherapy, physiology, nutrition, and ethical considerations, responses from the ChatGPT-simulated panel discussion reasonably pointed to various benefits such as 24/7 support, personalized advice, automated tracking, and reminders. ChatGPT also correctly emphasized the importance of patient education, and identified challenges such as limited interaction modes, inaccuracies in emotion-related advice, assurance of data privacy and security, transparency in data handling, and fairness in model training. It also stressed that chatbots are to assist as a copilot, not to replace human health care professionals in the rehabilitation process. Conclusions: ChatGPT exhibits strong competency in addressing interdisciplinary inquiry by simulating multiple experts from complementary backgrounds, with significant implications in assisting medical education. UR - https://mededu.jmir.org/2024/1/e51157 UR - http://dx.doi.org/10.2196/51157 UR - http://www.ncbi.nlm.nih.gov/pubmed/39042885 ID - info:doi/10.2196/51157 ER - TY - JOUR AU - Kashyap, Nick AU - Sebastian, Tresa Ann AU - Lynch, Chris AU - Jansons, Paul AU - Maddison, Ralph AU - Dingler, Tilman AU - Oldenburg, Brian PY - 2024/8/7 TI - Engagement With Conversational Agent?Enabled Interventions in Cardiometabolic Disease Management: Protocol for a Systematic Review JO - JMIR Res Protoc SP - e52973 VL - 13 KW - cardiometabolic disease KW - cardiovascular disease KW - diabetes KW - chronic disease KW - chatbot KW - acceptability KW - technology acceptance model KW - design KW - natural language processing KW - adult KW - heart failure KW - digital health intervention KW - Australia KW - systematic review KW - meta-analysis KW - digital health KW - conversational agent?enabled KW - health informatics KW - management N2 - Background: Cardiometabolic diseases (CMDs) are a group of interrelated conditions, including heart failure and diabetes, that increase the risk of cardiovascular and metabolic complications. The rising number of Australians with CMDs has necessitated new strategies for those managing these conditions, such as digital health interventions. The effectiveness of digital health interventions in supporting people with CMDs is dependent on the extent to which users engage with the tools. Augmenting digital health interventions with conversational agents, technologies that interact with people using natural language, may enhance engagement because of their human-like attributes. To date, no systematic review has compiled evidence on how design features influence the engagement of conversational agent?enabled interventions supporting people with CMDs. This review seeks to address this gap, thereby guiding developers in creating more engaging and effective tools for CMD management. Objective: The aim of this systematic review is to synthesize evidence pertaining to conversational agent?enabled intervention design features and their impacts on the engagement of people managing CMD. Methods: The review is conducted in accordance with the Cochrane Handbook for Systematic Reviews of Interventions and reported in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Searches will be conducted in the Ovid (Medline), Web of Science, and Scopus databases, which will be run again prior to manuscript submission. Inclusion criteria will consist of primary research studies reporting on conversational agent?enabled interventions, including measures of engagement, in adults with CMD. Data extraction will seek to capture the perspectives of people with CMD on the use of conversational agent?enabled interventions. Joanna Briggs Institute critical appraisal tools will be used to evaluate the overall quality of evidence collected. Results: This review was initiated in May 2023 and was registered with the International Prospective Register of Systematic Reviews (PROSPERO) in June 2023, prior to title and abstract screening. Full-text screening of articles was completed in July 2023 and data extraction began August 2023. Final searches were conducted in April 2024 prior to finalizing the review and the manuscript was submitted for peer review in July 2024. Conclusions: This review will synthesize diverse observations pertaining to conversational agent?enabled intervention design features and their impacts on engagement among people with CMDs. These observations can be used to guide the development of more engaging conversational agent?enabled interventions, thereby increasing the likelihood of regular intervention use and improved CMD health outcomes. Additionally, this review will identify gaps in the literature in terms of how engagement is reported, thereby highlighting areas for future exploration and supporting researchers in advancing the understanding of conversational agent?enabled interventions. Trial Registration: PROSPERO CRD42023431579; https://tinyurl.com/55cxkm26 International Registered Report Identifier (IRRID): DERR1-10.2196/52973 UR - https://www.researchprotocols.org/2024/1/e52973 UR - http://dx.doi.org/10.2196/52973 UR - http://www.ncbi.nlm.nih.gov/pubmed/39110504 ID - info:doi/10.2196/52973 ER - TY - JOUR AU - Burns, Christina AU - Bakaj, Angela AU - Berishaj, Amonda AU - Hristidis, Vagelis AU - Deak, Pamela AU - Equils, Ozlem PY - 2024/8/6 TI - Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study JO - JMIR Form Res SP - e59434 VL - 8 KW - ChatGPT KW - chat-GPT KW - chatbots KW - chat-bot KW - chat-bots KW - artificial intelligence KW - AI KW - machine learning KW - ML KW - large language model KW - large language models KW - LLM KW - LLMs KW - natural language processing KW - NLP KW - deep learning KW - chatbot KW - Google Search KW - internet KW - communication KW - English proficiency KW - readability KW - health literacy KW - health information KW - health education KW - health related questions KW - health information seeking KW - health access KW - reproductive health KW - oral contraceptive KW - birth control KW - emergency contraceptive KW - comparison KW - clinical KW - patients N2 - Background: Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally. Objective: A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill. Methods: A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, ?what should I do if I missed a day of my oral contraception birth control?? alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information. Results: The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text. Conclusions: ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed. UR - https://formative.jmir.org/2024/1/e59434 UR - http://dx.doi.org/10.2196/59434 UR - http://www.ncbi.nlm.nih.gov/pubmed/38986153 ID - info:doi/10.2196/59434 ER - TY - JOUR AU - He, Linwei AU - Basar, Erkan AU - Krahmer, Emiel AU - Wiers, Reinout AU - Antheunis, Marjolijn PY - 2024/8/6 TI - Effectiveness and User Experience of a Smoking Cessation Chatbot: Mixed Methods Study Comparing Motivational Interviewing and Confrontational Counseling JO - J Med Internet Res SP - e53134 VL - 26 KW - chatbot KW - smoking cessation KW - counseling KW - motivational interviewing KW - confrontational counseling KW - user experience KW - engagement N2 - Background: Cigarette smoking poses a major public health risk. Chatbots may serve as an accessible and useful tool to promote cessation due to their high accessibility and potential in facilitating long-term personalized interactions. To increase effectiveness and acceptability, there remains a need to identify and evaluate counseling strategies for these chatbots, an aspect that has not been comprehensively addressed in previous research. Objective: This study aims to identify effective counseling strategies for such chatbots to support smoking cessation. In addition, we sought to gain insights into smokers? expectations of and experiences with the chatbot. Methods: This mixed methods study incorporated a web-based experiment and semistructured interviews. Smokers (N=229) interacted with either a motivational interviewing (MI)?style (n=112, 48.9%) or a confrontational counseling?style (n=117, 51.1%) chatbot. Both cessation-related (ie, intention to quit and self-efficacy) and user experience?related outcomes (ie, engagement, therapeutic alliance, perceived empathy, and interaction satisfaction) were assessed. Semistructured interviews were conducted with 16 participants, 8 (50%) from each condition, and data were analyzed using thematic analysis. Results: Results from a multivariate ANOVA showed that participants had a significantly higher overall rating for the MI (vs confrontational counseling) chatbot. Follow-up discriminant analysis revealed that the better perception of the MI chatbot was mostly explained by the user experience?related outcomes, with cessation-related outcomes playing a lesser role. Exploratory analyses indicated that smokers in both conditions reported increased intention to quit and self-efficacy after the chatbot interaction. Interview findings illustrated several constructs (eg, affective attitude and engagement) explaining people?s previous expectations and timely and retrospective experience with the chatbot. Conclusions: The results confirmed that chatbots are a promising tool in motivating smoking cessation and the use of MI can improve user experience. We did not find extra support for MI to motivate cessation and have discussed possible reasons. Smokers expressed both relational and instrumental needs in the quitting process. Implications for future research and practice are discussed. UR - https://www.jmir.org/2024/1/e53134 UR - http://dx.doi.org/10.2196/53134 UR - http://www.ncbi.nlm.nih.gov/pubmed/39106097 ID - info:doi/10.2196/53134 ER - TY - JOUR AU - Huang, Thomas AU - Safranek, Conrad AU - Socrates, Vimig AU - Chartash, David AU - Wright, Donald AU - Dilip, Monisha AU - Sangal, B. Rohit AU - Taylor, Andrew Richard PY - 2024/8/2 TI - Patient-Representing Population's Perceptions of GPT-Generated Versus Standard Emergency Department Discharge Instructions: Randomized Blind Survey Assessment JO - J Med Internet Res SP - e60336 VL - 26 KW - machine learning KW - artificial intelligence KW - large language models KW - natural language processing KW - ChatGPT KW - discharge instructions KW - emergency medicine KW - emergency department KW - surveys and questionaries N2 - Background: Discharge instructions are a key form of documentation and patient communication in the time of transition from the emergency department (ED) to home. Discharge instructions are time-consuming and often underprioritized, especially in the ED, leading to discharge delays and possibly impersonal patient instructions. Generative artificial intelligence and large language models (LLMs) offer promising methods of creating high-quality and personalized discharge instructions; however, there exists a gap in understanding patient perspectives of LLM-generated discharge instructions. Objective: We aimed to assess the use of LLMs such as ChatGPT in synthesizing accurate and patient-accessible discharge instructions in the ED. Methods: We synthesized 5 unique, fictional ED encounters to emulate real ED encounters that included a diverse set of clinician history, physical notes, and nursing notes. These were passed to GPT-4 in Azure OpenAI Service (Microsoft) to generate LLM-generated discharge instructions. Standard discharge instructions were also generated for each of the 5 unique ED encounters. All GPT-generated and standard discharge instructions were then formatted into standardized after-visit summary documents. These after-visit summaries containing either GPT-generated or standard discharge instructions were randomly and blindly administered to Amazon MTurk respondents representing patient populations through Amazon MTurk Survey Distribution. Discharge instructions were assessed based on metrics of interpretability of significance, understandability, and satisfaction. Results: Our findings revealed that survey respondents? perspectives regarding GPT-generated and standard discharge instructions were significantly (P=.01) more favorable toward GPT-generated return precautions, and all other sections were considered noninferior to standard discharge instructions. Of the 156 survey respondents, GPT-generated discharge instructions were assigned favorable ratings, ?agree? and ?strongly agree,? more frequently along the metric of interpretability of significance in discharge instruction subsections regarding diagnosis, procedures, treatment, post-ED medications or any changes to medications, and return precautions. Survey respondents found GPT-generated instructions to be more understandable when rating procedures, treatment, post-ED medications or medication changes, post-ED follow-up, and return precautions. Satisfaction with GPT-generated discharge instruction subsections was the most favorable in procedures, treatment, post-ED medications or medication changes, and return precautions. Wilcoxon rank-sum test of Likert responses revealed significant differences (P=.01) in the interpretability of significant return precautions in GPT-generated discharge instructions compared to standard discharge instructions but not for other evaluation metrics and discharge instruction subsections. Conclusions: This study demonstrates the potential for LLMs such as ChatGPT to act as a method of augmenting current documentation workflows in the ED to reduce the documentation burden of physicians. The ability of LLMs to provide tailored instructions for patients by improving readability and making instructions more applicable to patients could improve upon the methods of communication that currently exist. UR - https://www.jmir.org/2024/1/e60336 UR - http://dx.doi.org/10.2196/60336 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60336 ER - TY - JOUR AU - Lee, Christine AU - Mohebbi, Matthew AU - O'Callaghan, Erin AU - Winsberg, Mirène PY - 2024/8/2 TI - Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study JO - JMIR Ment Health SP - e58129 VL - 11 KW - mental health KW - telehealth KW - PHQ-9 KW - Patient Health Questionnaire-9 KW - suicidal ideation KW - AI KW - LLM KW - OpenAI KW - GPT-4 KW - generative pretrained transformer 4 KW - tele-mental health KW - large language model KW - clinician KW - clinicians KW - artificial intelligence KW - patient information KW - suicide KW - suicidal KW - mental disorder KW - suicide attempt KW - psychologist KW - psychologists KW - psychiatrist KW - psychiatrists KW - psychiatry KW - clinical setting KW - self-reported KW - treatment KW - medication KW - digital mental health KW - machine learning KW - language model KW - crisis KW - telemental health KW - tele health KW - e-health KW - digital health N2 - Background: Due to recent advances in artificial intelligence, large language models (LLMs) have emerged as a powerful tool for a variety of language-related tasks, including sentiment analysis, and summarization of provider-patient interactions. However, there is limited research on these models in the area of crisis prediction. Objective: This study aimed to evaluate the performance of LLMs, specifically OpenAI?s generative pretrained transformer 4 (GPT-4), in predicting current and future mental health crisis episodes using patient-provided information at intake among users of a national telemental health platform. Methods: Deidentified patient-provided data were pulled from specific intake questions of the Brightside telehealth platform, including the chief complaint, for 140 patients who indicated suicidal ideation (SI), and another 120 patients who later indicated SI with a plan during the course of treatment. Similar data were pulled for 200 randomly selected patients, treated during the same time period, who never endorsed SI. In total, 6 senior Brightside clinicians (3 psychologists and 3 psychiatrists) were shown patients? self-reported chief complaint and self-reported suicide attempt history but were blinded to the future course of treatment and other reported symptoms, including SI. They were asked a simple yes or no question regarding their prediction of endorsement of SI with plan, along with their confidence level about the prediction. GPT-4 was provided with similar information and asked to answer the same questions, enabling us to directly compare the performance of artificial intelligence and clinicians. Results: Overall, the clinicians? average precision (0.7) was higher than that of GPT-4 (0.6) in identifying the SI with plan at intake (n=140) versus no SI (n=200) when using the chief complaint alone, while sensitivity was higher for the GPT-4 (0.62) than the clinicians? average (0.53). The addition of suicide attempt history increased the clinicians? average sensitivity (0.59) and precision (0.77) while increasing the GPT-4 sensitivity (0.59) but decreasing the GPT-4 precision (0.54). Performance decreased comparatively when predicting future SI with plan (n=120) versus no SI (n=200) with a chief complaint only for the clinicians (average sensitivity=0.4; average precision=0.59) and the GPT-4 (sensitivity=0.46; precision=0.48). The addition of suicide attempt history increased performance comparatively for the clinicians (average sensitivity=0.46; average precision=0.69) and the GPT-4 (sensitivity=0.74; precision=0.48). Conclusions: GPT-4, with a simple prompt design, produced results on some metrics that approached those of a trained clinician. Additional work must be done before such a model can be piloted in a clinical setting. The model should undergo safety checks for bias, given evidence that LLMs can perpetuate the biases of the underlying data on which they are trained. We believe that LLMs hold promise for augmenting the identification of higher-risk patients at intake and potentially delivering more timely care to patients. UR - https://mental.jmir.org/2024/1/e58129 UR - http://dx.doi.org/10.2196/58129 UR - http://www.ncbi.nlm.nih.gov/pubmed/38876484 ID - info:doi/10.2196/58129 ER - TY - JOUR AU - Aljamaan, Fadi AU - Temsah, Mohamad-Hani AU - Altamimi, Ibraheem AU - Al-Eyadhy, Ayman AU - Jamal, Amr AU - Alhasan, Khalid AU - Mesallam, A. Tamer AU - Farahat, Mohamed AU - Malki, H. Khalid PY - 2024/7/31 TI - Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study JO - JMIR Med Inform SP - e54345 VL - 12 KW - artificial intelligence (AI) chatbots KW - reference hallucination KW - bibliographic verification KW - ChatGPT KW - Perplexity KW - SciSpace KW - Elicit KW - Bing N2 - Background: Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. Objective: The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots? citations. Methods: Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference?s relevance to prompts? keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. Results: Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (? coefficient=?0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (? coefficient=?0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (? coefficient=0.486; P<.001). Conclusions: The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots? RHS could contribute to ongoing efforts to enhance AI?s general reliability in medical research. UR - https://medinform.jmir.org/2024/1/e54345 UR - http://dx.doi.org/10.2196/54345 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54345 ER - TY - JOUR AU - Lawrence, R. Hannah AU - Schneider, A. Renee AU - Rubin, B. Susan AU - Matari?, J. Maja AU - McDuff, J. Daniel AU - Jones Bell, Megan PY - 2024/7/29 TI - The Opportunities and Risks of Large Language Models in Mental Health JO - JMIR Ment Health SP - e59479 VL - 11 KW - artificial intelligence KW - AI KW - generative AI KW - large language models KW - mental health KW - mental health education KW - language model KW - mental health care KW - health equity KW - ethical KW - development KW - deployment UR - https://mental.jmir.org/2024/1/e59479 UR - http://dx.doi.org/10.2196/59479 ID - info:doi/10.2196/59479 ER - TY - JOUR AU - Zhui, Li AU - Yhap, Nina AU - Liping, Liu AU - Zhengjie, Wang AU - Zhonghao, Xiong AU - Xiaoshu, Yuan AU - Hong, Cui AU - Xuexiu, Liu AU - Wei, Ren PY - 2024/7/25 TI - Impact of Large Language Models on Medical Education and Teaching Adaptations JO - JMIR Med Inform SP - e55933 VL - 12 KW - large language models KW - medical education KW - opportunities KW - challenges KW - critical thinking KW - educator UR - https://medinform.jmir.org/2024/1/e55933 UR - http://dx.doi.org/10.2196/55933 ID - info:doi/10.2196/55933 ER - TY - JOUR AU - Burke, B. Harry AU - Hoang, Albert AU - Lopreiato, O. Joseph AU - King, Heidi AU - Hemmer, Paul AU - Montgomery, Michael AU - Gagarin, Viktoria PY - 2024/7/25 TI - Assessing the Ability of a Large Language Model to Score Free-Text Medical Student Clinical Notes: Quantitative Study JO - JMIR Med Educ SP - e56342 VL - 10 KW - medical education KW - generative artificial intelligence KW - natural language processing KW - ChatGPT KW - generative pretrained transformer KW - standardized patients KW - clinical notes KW - free-text notes KW - history and physical examination KW - large language model KW - LLM KW - medical student KW - medical students KW - clinical information KW - artificial intelligence KW - AI KW - patients KW - patient KW - medicine N2 - Background: Teaching medical students the skills required to acquire, interpret, apply, and communicate clinical information is an integral part of medical education. A crucial aspect of this process involves providing students with feedback regarding the quality of their free-text clinical notes. Objective: The goal of this study was to assess the ability of ChatGPT 3.5, a large language model, to score medical students? free-text history and physical notes. Methods: This is a single-institution, retrospective study. Standardized patients learned a prespecified clinical case and, acting as the patient, interacted with medical students. Each student wrote a free-text history and physical note of their interaction. The students? notes were scored independently by the standardized patients and ChatGPT using a prespecified scoring rubric that consisted of 85 case elements. The measure of accuracy was percent correct. Results: The study population consisted of 168 first-year medical students. There was a total of 14,280 scores. The ChatGPT incorrect scoring rate was 1.0%, and the standardized patient incorrect scoring rate was 7.2%. The ChatGPT error rate was 86%, lower than the standardized patient error rate. The ChatGPT mean incorrect scoring rate of 12 (SD 11) was significantly lower than the standardized patient mean incorrect scoring rate of 85 (SD 74; P=.002). Conclusions: ChatGPT demonstrated a significantly lower error rate compared to standardized patients. This is the first study to assess the ability of a generative pretrained transformer (GPT) program to score medical students? standardized patient-based free-text clinical notes. It is expected that, in the near future, large language models will provide real-time feedback to practicing physicians regarding their free-text notes. GPT artificial intelligence programs represent an important advance in medical education and medical practice. UR - https://mededu.jmir.org/2024/1/e56342 UR - http://dx.doi.org/10.2196/56342 ID - info:doi/10.2196/56342 ER - TY - JOUR AU - Unlu, Ozan AU - Pikcilingis, Aaron AU - Letourneau, Jonathan AU - Landman, Adam AU - Patel, Rajesh AU - Shenoy, S. Erica AU - Hashimoto, Dean AU - Kim, Marvel AU - Pellecer, Johnny AU - Zhang, Haipeng PY - 2024/7/25 TI - Implementation of a Web-Based Chatbot to Guide Hospital Employees in Returning to Work During the COVID-19 Pandemic: Development and Before-and-After Evaluation JO - JMIR Form Res SP - e43119 VL - 8 KW - chatbot KW - return to work KW - employee KW - health care personnel KW - COVID-19 KW - conversational agent KW - occupational health KW - support service KW - health care delivery KW - agile methodology KW - digital intervention KW - digital support KW - work policy KW - hospital staff N2 - Background: Throughout the COVID-19 pandemic, multiple policies and guidelines were issued and updated for health care personnel (HCP) for COVID-19 testing and returning to work after reporting symptoms, exposures, or infection. The high frequency of changes and complexity of the policies made it difficult for HCP to understand when they needed testing and were eligible to return to work (RTW), which increased calls to Occupational Health Services (OHS), creating a need for other tools to guide HCP. Chatbots have been used as novel tools to facilitate immediate responses to patients? and employees? queries about COVID-19, assess symptoms, and guide individuals to appropriate care resources. Objective: This study aims to describe the development of an RTW chatbot and report its impact on demand for OHS support services during the first Omicron variant surge. Methods: This study was conducted at Mass General Brigham, an integrated health care system with over 80,000 employees. The RTW chatbot was developed using an agile design methodology. We mapped the RTW policy into a unified flow diagram that included all required questions and recommendations, then built and tested the chatbot using the Microsoft Azure Healthbot Framework. Using chatbot data and OHS call data from December 10, 2021, to February 17, 2022, we compared OHS resource use before and after the deployment of the RTW chatbot, including the number of calls to the OHS hotline, wait times, call length, and time OHS hotline staff spent on the phone. We also assessed Centers for Disease Control and Prevention data for COVID-19 case trends during the study period. Results: In the 5 weeks post deployment, 5575 users used the RTW chatbot with a mean interaction time of 1 minute and 17 seconds. The highest engagement was on January 25, 2022, with 368 users, which was 2 weeks after the peak of the first Omicron surge in Massachusetts. Among users who completed all the chatbot questions, 461 (71.6%) met the RTW criteria. During the 10 weeks, the median (IQR) number of daily calls that OHS received before and after deployment of the chatbot were 633 (251-934) and 115 (62-167), respectively (U=163; P<.001). The median time from dialing the OHS phone number to hanging up decreased from 28 minutes and 22 seconds (IQR 25:14-31:05) to 6 minutes and 25 seconds (IQR 5:32-7:08) after chatbot deployment (U=169; P<.001). Over the 10 weeks, the median time OHS hotline staff spent on the phone declined from 3 hours and 11 minutes (IQR 2:32-4:15) per day to 47 (IQR 42-54) minutes (U=193; P<.001), saving approximately 16.8 hours per OHS staff member per week. Conclusions: Using the agile methodology, a chatbot can be rapidly designed and deployed for employees to efficiently receive guidance regarding RTW that complies with the complex and shifting RTW policies, which may reduce use of OHS resources. UR - https://formative.jmir.org/2024/1/e43119 UR - http://dx.doi.org/10.2196/43119 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/43119 ER - TY - JOUR AU - Liu, Mingxin AU - Okuhara, Tsuyoshi AU - Chang, XinYi AU - Shirabe, Ritsuko AU - Nishiie, Yuriko AU - Okada, Hiroko AU - Kiuchi, Takahiro PY - 2024/7/25 TI - Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e60807 VL - 26 KW - large language model, ChatGPT, medical licensing examination, medical education KW - LLMs KW - NLP KW - natural language processing KW - artificial intelligence KW - language models KW - review methods KW - systematic KW - meta-analysis N2 - Background: Over the past 2 years, researchers have used various medical licensing examinations to test whether ChatGPT (OpenAI) possesses accurate medical knowledge. The performance of each version of ChatGPT on the medical licensing examination in multiple environments showed remarkable differences. At this stage, there is still a lack of a comprehensive understanding of the variability in ChatGPT?s performance on different medical licensing examinations. Objective: In this study, we reviewed all studies on ChatGPT performance in medical licensing examinations up to March 2024. This review aims to contribute to the evolving discourse on artificial intelligence (AI) in medical education by providing a comprehensive analysis of the performance of ChatGPT in various environments. The insights gained from this systematic review will guide educators, policymakers, and technical experts to effectively and judiciously use AI in medical education. Methods: We searched the literature published between January 1, 2022, and March 29, 2024, by searching query strings in Web of Science, PubMed, and Scopus. Two authors screened the literature according to the inclusion and exclusion criteria, extracted data, and independently assessed the quality of the literature concerning Quality Assessment of Diagnostic Accuracy Studies-2. We conducted both qualitative and quantitative analyses. Results: A total of 45 studies on the performance of different versions of ChatGPT in medical licensing examinations were included in this study. GPT-4 achieved an overall accuracy rate of 81% (95% CI 78-84; P<.01), significantly surpassing the 58% (95% CI 53-63; P<.01) accuracy rate of GPT-3.5. GPT-4 passed the medical examinations in 26 of 29 cases, outperforming the average scores of medical students in 13 of 17 cases. Translating the examination questions into English improved GPT-3.5?s performance but did not affect GPT-4. GPT-3.5 showed no difference in performance between examinations from English-speaking and non?English-speaking countries (P=.72), but GPT-4 performed better on examinations from English-speaking countries significantly (P=.02). Any type of prompt could significantly improve GPT-3.5?s (P=.03) and GPT-4?s (P<.01) performance. GPT-3.5 performed better on short-text questions than on long-text questions. The difficulty of the questions affected the performance of GPT-3.5 and GPT-4. In image-based multiple-choice questions (MCQs), ChatGPT?s accuracy rate ranges from 13.1% to 100%. ChatGPT performed significantly worse on open-ended questions than on MCQs. Conclusions: GPT-4 demonstrates considerable potential for future use in medical education. However, due to its insufficient accuracy, inconsistent performance, and the challenges posed by differing medical policies and knowledge across countries, GPT-4 is not yet suitable for use in medical education. Trial Registration: PROSPERO CRD42024506687; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=506687 UR - https://www.jmir.org/2024/1/e60807 UR - http://dx.doi.org/10.2196/60807 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60807 ER - TY - JOUR AU - Tung, Min Joshua Yi AU - Gill, Ravinder Sunil AU - Sng, Ren Gerald Gui AU - Lim, Zheng Daniel Yan AU - Ke, Yuhe AU - Tan, Fang Ting AU - Jin, Liyuan AU - Elangovan, Kabilan AU - Ong, Ling Jasmine Chiat AU - Abdullah, Rizal Hairil AU - Ting, Wei Daniel Shu AU - Chong, Wen Tsung PY - 2024/7/24 TI - Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study JO - J Med Internet Res SP - e57721 VL - 26 KW - artificial intelligence KW - AI KW - discharge summaries KW - continuity of care KW - large language model KW - LLM KW - junior clinician KW - letter writing KW - single-blinded KW - ChatGPT KW - urology KW - primary care KW - fictional electronic record KW - consultation note KW - referral letter KW - simulated environment N2 - Background: Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality. Objective: The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians. Methods: Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient?s care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool. Results: GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively. Conclusions: Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation. UR - https://www.jmir.org/2024/1/e57721 UR - http://dx.doi.org/10.2196/57721 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57721 ER - TY - JOUR AU - Kamel Boulos, N. Maged AU - Dellavalle, Robert PY - 2024/7/24 TI - NVIDIA?s ?Chat with RTX? Custom Large Language Model and Personalized AI Chatbot Augments the Value of Electronic Dermatology Reference Material JO - JMIR Dermatol SP - e58396 VL - 7 KW - AI chatbots KW - artificial intelligence KW - AI KW - generative AI KW - large language models KW - dermatology KW - education KW - self-study KW - NVIDIA RTX KW - retrieval-augmented generation KW - RAG UR - https://derma.jmir.org/2024/1/e58396 UR - http://dx.doi.org/10.2196/58396 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58396 ER - TY - JOUR AU - Wu, Fei Philip AU - Summers, Charlotte AU - Panesar, Arjun AU - Kaura, Amit AU - Zhang, Li PY - 2024/7/23 TI - AI Hesitancy and Acceptability?Perceptions of AI Chatbots for Chronic Health Management and Long COVID Support: Survey Study JO - JMIR Hum Factors SP - e51086 VL - 11 KW - AI hesitancy KW - chatbot KW - long COVID KW - diabetes KW - chronic disease management KW - technology acceptance KW - post?COVID-19 condition KW - artificial intelligence N2 - Background: Artificial intelligence (AI) chatbots have the potential to assist individuals with chronic health conditions by providing tailored information, monitoring symptoms, and offering mental health support. Despite their potential benefits, research on public attitudes toward health care chatbots is still limited. To effectively support individuals with long-term health conditions like long COVID (or post?COVID-19 condition), it is crucial to understand their perspectives and preferences regarding the use of AI chatbots. Objective: This study has two main objectives: (1) provide insights into AI chatbot acceptance among people with chronic health conditions, particularly adults older than 55 years and (2) explore the perceptions of using AI chatbots for health self-management and long COVID support. Methods: A web-based survey study was conducted between January and March 2023, specifically targeting individuals with diabetes and other chronic conditions. This particular population was chosen due to their potential awareness and ability to self-manage their condition. The survey aimed to capture data at multiple intervals, taking into consideration the public launch of ChatGPT, which could have potentially impacted public opinions during the project timeline. The survey received 1310 clicks and garnered 900 responses, resulting in a total of 888 usable data points. Results: Although past experience with chatbots (P<.001, 95% CI .110-.302) and online information seeking (P<.001, 95% CI .039-.084) are strong indicators of respondents? future adoption of health chatbots, they are in general skeptical or unsure about the use of AI chatbots for health care purposes. Less than one-third of the respondents (n=203, 30.1%) indicated that they were likely to use a health chatbot in the next 12 months if available. Most were uncertain about a chatbot?s capability to provide accurate medical advice. However, people seemed more receptive to using voice-based chatbots for mental well-being, health data collection, and analysis. Half of the respondents with long COVID showed interest in using emotionally intelligent chatbots. Conclusions: AI hesitancy is not uniform across all health domains and user groups. Despite persistent AI hesitancy, there are promising opportunities for chatbots to offer support for chronic conditions in areas of lifestyle enhancement and mental well-being, potentially through voice-based user interfaces. UR - https://humanfactors.jmir.org/2024/1/e51086 UR - http://dx.doi.org/10.2196/51086 ID - info:doi/10.2196/51086 ER - TY - JOUR AU - Laymouna, Moustafa AU - Ma, Yuanchao AU - Lessard, David AU - Schuster, Tibor AU - Engler, Kim AU - Lebouché, Bertrand PY - 2024/7/23 TI - Roles, Users, Benefits, and Limitations of Chatbots in Health Care: Rapid Review JO - J Med Internet Res SP - e56930 VL - 26 KW - chatbot KW - conversational agent KW - conversational assistant KW - user-computer interface KW - digital health KW - mobile health KW - electronic health KW - telehealth KW - artificial intelligence KW - AI KW - health information technology N2 - Background: Chatbots, or conversational agents, have emerged as significant tools in health care, driven by advancements in artificial intelligence and digital technology. These programs are designed to simulate human conversations, addressing various health care needs. However, no comprehensive synthesis of health care chatbots? roles, users, benefits, and limitations is available to inform future research and application in the field. Objective: This review aims to describe health care chatbots? characteristics, focusing on their diverse roles in the health care pathway, user groups, benefits, and limitations. Methods: A rapid review of published literature from 2017 to 2023 was performed with a search strategy developed in collaboration with a health sciences librarian and implemented in the MEDLINE and Embase databases. Primary research studies reporting on chatbot roles or benefits in health care were included. Two reviewers dual-screened the search results. Extracted data on chatbot roles, users, benefits, and limitations were subjected to content analysis. Results: The review categorized chatbot roles into 2 themes: delivery of remote health services, including patient support, care management, education, skills building, and health behavior promotion, and provision of administrative assistance to health care providers. User groups spanned across patients with chronic conditions as well as patients with cancer; individuals focused on lifestyle improvements; and various demographic groups such as women, families, and older adults. Professionals and students in health care also emerged as significant users, alongside groups seeking mental health support, behavioral change, and educational enhancement. The benefits of health care chatbots were also classified into 2 themes: improvement of health care quality and efficiency and cost-effectiveness in health care delivery. The identified limitations encompassed ethical challenges, medicolegal and safety concerns, technical difficulties, user experience issues, and societal and economic impacts. Conclusions: Health care chatbots offer a wide spectrum of applications, potentially impacting various aspects of health care. While they are promising tools for improving health care efficiency and quality, their integration into the health care system must be approached with consideration of their limitations to ensure optimal, safe, and equitable use. UR - https://www.jmir.org/2024/1/e56930 UR - http://dx.doi.org/10.2196/56930 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56930 ER - TY - JOUR AU - Bricker, B. Jonathan AU - Sullivan, Brianna AU - Mull, Kristin AU - Santiago-Torres, Margarita AU - Lavista Ferres, M. Juan PY - 2024/7/23 TI - Conversational Chatbot for Cigarette Smoking Cessation: Results From the 11-Step User-Centered Design Development Process and Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e57318 VL - 12 KW - chatbot KW - conversational agent KW - conversational agents KW - digital therapeutics KW - smoking cessation KW - development KW - develop KW - design KW - smoking KW - smoke KW - smokers KW - quit KW - quitting KW - cessation KW - chatbots KW - large language model KW - LLM KW - LLMs KW - large language models KW - addict KW - addiction KW - addictions KW - mobile phone N2 - Background: Conversational chatbots are an emerging digital intervention for smoking cessation. No studies have reported on the entire development process of a cessation chatbot. Objective: We aim to report results of the user-centered design development process and randomized controlled trial for a novel and comprehensive quit smoking conversational chatbot called QuitBot. Methods: The 4 years of formative research for developing QuitBot followed an 11-step process: (1) specifying a conceptual model; (2) conducting content analysis of existing interventions (63 hours of intervention transcripts); (3) assessing user needs; (4) developing the chat?s persona (?personality?); (5) prototyping content and persona; (6) developing full functionality; (7) programming the QuitBot; (8) conducting a diary study; (9) conducting a pilot randomized controlled trial (RCT); (10) reviewing results of the RCT; and (11) adding a free-form question and answer (QnA) function, based on user feedback from pilot RCT results. The process of adding a QnA function itself involved a three-step process: (1) generating QnA pairs, (2) fine-tuning large language models (LLMs) on QnA pairs, and (3) evaluating the LLM outputs. Results: We developed a quit smoking program spanning 42 days of 2- to 3-minute conversations covering topics ranging from motivations to quit, setting a quit date, choosing Food and Drug Administration?approved cessation medications, coping with triggers, and recovering from lapses and relapses. In a pilot RCT with 96% three-month outcome data retention, QuitBot demonstrated high user engagement and promising cessation rates compared to the National Cancer Institute?s SmokefreeTXT text messaging program, particularly among those who viewed all 42 days of program content: 30-day, complete-case, point prevalence abstinence rates at 3-month follow-up were 63% (39/62) for QuitBot versus 38.5% (45/117) for SmokefreeTXT (odds ratio 2.58, 95% CI 1.34-4.99; P=.005). However, Facebook Messenger intermittently blocked participants? access to QuitBot, so we transitioned from Facebook Messenger to a stand-alone smartphone app as the communication channel. Participants? frustration with QuitBot?s inability to answer their open-ended questions led to us develop a core conversational feature, enabling users to ask open-ended questions about quitting cigarette smoking and for the QuitBot to respond with accurate and professional answers. To support this functionality, we developed a library of 11,000 QnA pairs on topics associated with quitting cigarette smoking. Model testing results showed that Microsoft?s Azure-based QnA maker effectively handled questions that matched our library of 11,000 QnA pairs. A fine-tuned, contextualized GPT-3.5 (OpenAI) responds to questions that are not within our library of QnA pairs. Conclusions: The development process yielded the first LLM-based quit smoking program delivered as a conversational chatbot. Iterative testing led to significant enhancements, including improvements to the delivery channel. A pivotal addition was the inclusion of a core LLM?supported conversational feature allowing users to ask open-ended questions. Trial Registration: ClinicalTrials.gov NCT03585231; https://clinicaltrials.gov/study/NCT03585231 UR - https://mhealth.jmir.org/2024/1/e57318 UR - http://dx.doi.org/10.2196/57318 UR - http://www.ncbi.nlm.nih.gov/pubmed/38913882 ID - info:doi/10.2196/57318 ER - TY - JOUR AU - Sezgin, Emre AU - Kocaballi, Baki A. AU - Dolce, Millie AU - Skeens, Micah AU - Militello, Lisa AU - Huang, Yungui AU - Stevens, Jack AU - Kemper, R. Alex PY - 2024/7/19 TI - Chatbot for Social Need Screening and Resource Sharing With Vulnerable Families: Iterative Design and Evaluation Study JO - JMIR Hum Factors SP - e57114 VL - 11 KW - social determinants of health KW - social needs KW - chatbot KW - conversational agent KW - primary care KW - digital health KW - iterative design KW - implementation KW - evaluation KW - usability KW - feasibility N2 - Background: Health outcomes are significantly influenced by unmet social needs. Although screening for social needs has become common in health care settings, there is often poor linkage to resources after needs are identified. The structural barriers (eg, staffing, time, and space) to helping address social needs could be overcome by a technology-based solution. Objective: This study aims to present the design and evaluation of a chatbot, DAPHNE (Dialog-Based Assistant Platform for Healthcare and Needs Ecosystem), which screens for social needs and links patients and families to resources. Methods: This research used a three-stage study approach: (1) an end-user survey to understand unmet needs and perception toward chatbots, (2) iterative design with interdisciplinary stakeholder groups, and (3) a feasibility and usability assessment. In study 1, a web-based survey was conducted with low-income US resident households (n=201). Following that, in study 2, web-based sessions were held with an interdisciplinary group of stakeholders (n=10) using thematic and content analysis to inform the chatbot?s design and development. Finally, in study 3, the assessment on feasibility and usability was completed via a mix of a web-based survey and focus group interviews following scenario-based usability testing with community health workers (family advocates; n=4) and social workers (n=9). We reported descriptive statistics and chi-square test results for the household survey. Content analysis and thematic analysis were used to analyze qualitative data. Usability score was descriptively reported. Results: Among the survey participants, employed and younger individuals reported a higher likelihood of using a chatbot to address social needs, in contrast to the oldest age group. Regarding designing the chatbot, the stakeholders emphasized the importance of provider-technology collaboration, inclusive conversational design, and user education. The participants found that the chatbot?s capabilities met expectations and that the chatbot was easy to use (System Usability Scale score=72/100). However, there were common concerns about the accuracy of suggested resources, electronic health record integration, and trust with a chatbot. Conclusions: Chatbots can provide personalized feedback for families to identify and meet social needs. Our study highlights the importance of user-centered iterative design and development of chatbots for social needs. Future research should examine the efficacy, cost-effectiveness, and scalability of chatbot interventions to address social needs. UR - https://humanfactors.jmir.org/2024/1/e57114 UR - http://dx.doi.org/10.2196/57114 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57114 ER - TY - JOUR AU - Wu, Qingxia AU - Li, Huali AU - Wang, Yan AU - Bai, Yan AU - Wu, Yaping AU - Yu, Xuan AU - Li, Xiaodong AU - Dong, Pei AU - Xue, Jon AU - Shen, Dinggang AU - Wang, Meiyun PY - 2024/7/17 TI - Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study JO - JMIR Med Inform SP - e55799 VL - 12 KW - Radiology Reporting and Data Systems KW - LI-RADS KW - Lung-RADS KW - O-RADS KW - large language model KW - ChatGPT KW - chatbot KW - chatbots KW - categorization KW - recommendation KW - recommendations KW - accuracy N2 - Background: Large language models show promise for improving radiology workflows, but their performance on structured radiological tasks such as Reporting and Data Systems (RADS) categorization remains unexplored. Objective: This study aims to evaluate 3 large language model chatbots?Claude-2, GPT-3.5, and GPT-4?on assigning RADS categories to radiology reports and assess the impact of different prompting strategies. Methods: This cross-sectional study compared 3 chatbots using 30 radiology reports (10 per RADS criteria), using a 3-level prompting strategy: zero-shot, few-shot, and guideline PDF-informed prompts. The cases were grounded in Liver Imaging Reporting & Data System (LI-RADS) version 2018, Lung CT (computed tomography) Screening Reporting & Data System (Lung-RADS) version 2022, and Ovarian-Adnexal Reporting & Data System (O-RADS) magnetic resonance imaging, meticulously prepared by board-certified radiologists. Each report underwent 6 assessments. Two blinded reviewers assessed the chatbots? response at patient-level RADS categorization and overall ratings. The agreement across repetitions was assessed using Fleiss ?. Results: Claude-2 achieved the highest accuracy in overall ratings with few-shot prompts and guideline PDFs (prompt-2), attaining 57% (17/30) average accuracy over 6 runs and 50% (15/30) accuracy with k-pass voting. Without prompt engineering, all chatbots performed poorly. The introduction of a structured exemplar prompt (prompt-1) increased the accuracy of overall ratings for all chatbots. Providing prompt-2 further improved Claude-2?s performance, an enhancement not replicated by GPT-4. The interrun agreement was substantial for Claude-2 (k=0.66 for overall rating and k=0.69 for RADS categorization), fair for GPT-4 (k=0.39 for both), and fair for GPT-3.5 (k=0.21 for overall rating and k=0.39 for RADS categorization). All chatbots showed significantly higher accuracy with LI-RADS version 2018 than with Lung-RADS version 2022 and O-RADS (P<.05); with prompt-2, Claude-2 achieved the highest overall rating accuracy of 75% (45/60) in LI-RADS version 2018. Conclusions: When equipped with structured prompts and guideline PDFs, Claude-2 demonstrated potential in assigning RADS categories to radiology cases according to established criteria such as LI-RADS version 2018. However, the current generation of chatbots lags in accurately categorizing cases based on more recent RADS criteria. UR - https://medinform.jmir.org/2024/1/e55799 UR - http://dx.doi.org/10.2196/55799 UR - http://www.ncbi.nlm.nih.gov/pubmed/39018102 ID - info:doi/10.2196/55799 ER - TY - JOUR AU - Anisha, Azmin Sadia AU - Sen, Arkendu AU - Bain, Chris PY - 2024/7/16 TI - Evaluating the Potential and Pitfalls of AI-Powered Conversational Agents as Humanlike Virtual Health Carers in the Remote Management of Noncommunicable Diseases: Scoping Review JO - J Med Internet Res SP - e56114 VL - 26 KW - conversational agents KW - artificial intelligence KW - noncommunicable disease KW - self-management KW - remote monitoring KW - mobile phone N2 - Background: The rising prevalence of noncommunicable diseases (NCDs) worldwide and the high recent mortality rates (74.4%) associated with them, especially in low- and middle-income countries, is causing a substantial global burden of disease, necessitating innovative and sustainable long-term care solutions. Objective: This scoping review aims to investigate the impact of artificial intelligence (AI)?based conversational agents (CAs)?including chatbots, voicebots, and anthropomorphic digital avatars?as human-like health caregivers in the remote management of NCDs as well as identify critical areas for future research and provide insights into how these technologies might be used effectively in health care to personalize NCD management strategies. Methods: A broad literature search was conducted in July 2023 in 6 electronic databases?Ovid MEDLINE, Embase, PsycINFO, PubMed, CINAHL, and Web of Science?using the search terms ?conversational agents,? ?artificial intelligence,? and ?noncommunicable diseases,? including their associated synonyms. We also manually searched gray literature using sources such as ProQuest Central, ResearchGate, ACM Digital Library, and Google Scholar. We included empirical studies published in English from January 2010 to July 2023 focusing solely on health care?oriented applications of CAs used for remote management of NCDs. The narrative synthesis approach was used to collate and summarize the relevant information extracted from the included studies. Results: The literature search yielded a total of 43 studies that matched the inclusion criteria. Our review unveiled four significant findings: (1) higher user acceptance and compliance with anthropomorphic and avatar-based CAs for remote care; (2) an existing gap in the development of personalized, empathetic, and contextually aware CAs for effective emotional and social interaction with users, along with limited consideration of ethical concerns such as data privacy and patient safety; (3) inadequate evidence of the efficacy of CAs in NCD self-management despite a moderate to high level of optimism among health care professionals regarding CAs? potential in remote health care; and (4) CAs primarily being used for supporting nonpharmacological interventions such as behavioral or lifestyle modifications and patient education for the self-management of NCDs. Conclusions: This review makes a unique contribution to the field by not only providing a quantifiable impact analysis but also identifying the areas requiring imminent scholarly attention for the ethical, empathetic, and efficacious implementation of AI in NCD care. This serves as an academic cornerstone for future research in AI-assisted health care for NCD management. Trial Registration: Open Science Framework; https://doi.org/10.17605/OSF.IO/GU5PX UR - https://www.jmir.org/2024/1/e56114 UR - http://dx.doi.org/10.2196/56114 UR - http://www.ncbi.nlm.nih.gov/pubmed/39012688 ID - info:doi/10.2196/56114 ER - TY - JOUR AU - Jo, Eunbeen AU - Song, Sanghoun AU - Kim, Jong-Ho AU - Lim, Subin AU - Kim, Hyeon Ju AU - Cha, Jung-Joon AU - Kim, Young-Min AU - Joo, Joon Hyung PY - 2024/7/8 TI - Assessing GPT-4?s Performance in Delivering Medical Advice: Comparative Analysis With Human Experts JO - JMIR Med Educ SP - e51282 VL - 10 KW - GPT-4 KW - medical advice KW - ChatGPT KW - cardiology KW - cardiologist KW - heart KW - advice KW - recommendation KW - recommendations KW - linguistic KW - linguistics KW - artificial intelligence KW - NLP KW - natural language processing KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - response KW - responses N2 - Background: Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI?s GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse. Objective: This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses. Methods: We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio. Results: GPT-4 and human experts displayed comparable efficacy in medical accuracy (?GPT-4 is better? at 132/251, 52.6% vs ?Human expert is better? at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P=.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P<.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P<.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience. Conclusions: GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions. UR - https://mededu.jmir.org/2024/1/e51282 UR - http://dx.doi.org/10.2196/51282 ID - info:doi/10.2196/51282 ER - TY - JOUR AU - Hassanipour, Soheil AU - Nayak, Sandeep AU - Bozorgi, Ali AU - Keivanlou, Mohammad-Hossein AU - Dave, Tirth AU - Alotaibi, Abdulhadi AU - Joukar, Farahnaz AU - Mellatdoust, Parinaz AU - Bakhshi, Arash AU - Kuriyakose, Dona AU - Polisetty, D. Lakshmi AU - Chimpiri, Mallika AU - Amini-Salehi, Ehsan PY - 2024/7/8 TI - The Ability of ChatGPT in Paraphrasing Texts and Reducing Plagiarism: A Descriptive Analysis JO - JMIR Med Educ SP - e53308 VL - 10 KW - ChatGPT KW - paraphrasing KW - text generation KW - prompts KW - academic journals KW - plagiarize KW - plagiarism KW - paraphrase KW - wording KW - LLM KW - LLMs KW - language model KW - language models KW - prompt KW - generative KW - artificial intelligence KW - NLP KW - natural language processing KW - rephrase KW - plagiarizing KW - honesty KW - integrity KW - texts KW - text KW - textual KW - generation KW - large language model KW - large language models N2 - Background: The introduction of ChatGPT by OpenAI has garnered significant attention. Among its capabilities, paraphrasing stands out. Objective: This study aims to investigate the satisfactory levels of plagiarism in the paraphrased text produced by this chatbot. Methods: Three texts of varying lengths were presented to ChatGPT. ChatGPT was then instructed to paraphrase the provided texts using five different prompts. In the subsequent stage of the study, the texts were divided into separate paragraphs, and ChatGPT was requested to paraphrase each paragraph individually. Lastly, in the third stage, ChatGPT was asked to paraphrase the texts it had previously generated. Results: The average plagiarism rate in the texts generated by ChatGPT was 45% (SD 10%). ChatGPT exhibited a substantial reduction in plagiarism for the provided texts (mean difference ?0.51, 95% CI ?0.54 to ?0.48; P<.001). Furthermore, when comparing the second attempt with the initial attempt, a significant decrease in the plagiarism rate was observed (mean difference ?0.06, 95% CI ?0.08 to ?0.03; P<.001). The number of paragraphs in the texts demonstrated a noteworthy association with the percentage of plagiarism, with texts consisting of a single paragraph exhibiting the lowest plagiarism rate (P<.001). Conclusion: Although ChatGPT demonstrates a notable reduction of plagiarism within texts, the existing levels of plagiarism remain relatively high. This underscores a crucial caution for researchers when incorporating this chatbot into their work. UR - https://mededu.jmir.org/2024/1/e53308 UR - http://dx.doi.org/10.2196/53308 ID - info:doi/10.2196/53308 ER - TY - JOUR AU - Ferrario, Andrea AU - Sedlakova, Jana AU - Trachsel, Manuel PY - 2024/7/2 TI - The Role of Humanization and Robustness of Large Language Models in Conversational Artificial Intelligence for Individuals With Depression: A Critical Analysis JO - JMIR Ment Health SP - e56569 VL - 11 KW - generative AI KW - large language models KW - large language model KW - LLM KW - LLMs KW - machine learning KW - ML KW - natural language processing KW - NLP KW - deep learning KW - depression KW - mental health KW - mental illness KW - mental disease KW - mental diseases KW - mental illnesses KW - artificial intelligence KW - AI KW - digital health KW - digital technology KW - digital intervention KW - digital interventions KW - ethics UR - https://mental.jmir.org/2024/1/e56569 UR - http://dx.doi.org/10.2196/56569 ID - info:doi/10.2196/56569 ER - TY - JOUR AU - Xu, Jie AU - Lu, Lu AU - Peng, Xinwei AU - Pang, Jiali AU - Ding, Jinru AU - Yang, Lingrui AU - Song, Huan AU - Li, Kang AU - Sun, Xin AU - Zhang, Shaoting PY - 2024/6/28 TI - Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation JO - JMIR Med Inform SP - e57674 VL - 12 KW - ChatGPT KW - LLM KW - assessment KW - data set KW - benchmark KW - medicine N2 - Background: Large language models (LLMs) have achieved great progress in natural language processing tasks and demonstrated the potential for use in clinical applications. Despite their capabilities, LLMs in the medical domain are prone to generating hallucinations (not fully reliable responses). Hallucinations in LLMs? responses create substantial risks, potentially threatening patients? physical safety. Thus, to perceive and prevent this safety risk, it is essential to evaluate LLMs in the medical domain and build a systematic evaluation. Objective: We developed a comprehensive evaluation system, MedGPTEval, composed of criteria, medical data sets in Chinese, and publicly available benchmarks. Methods: First, a set of evaluation criteria was designed based on a comprehensive literature review. Second, existing candidate criteria were optimized by using a Delphi method with 5 experts in medicine and engineering. Third, 3 clinical experts designed medical data sets to interact with LLMs. Finally, benchmarking experiments were conducted on the data sets. The responses generated by chatbots based on LLMs were recorded for blind evaluations by 5 licensed medical experts. The evaluation criteria that were obtained covered medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with 16 detailed indicators. The medical data sets include 27 medical dialogues and 7 case reports in Chinese. Three chatbots were evaluated: ChatGPT by OpenAI; ERNIE Bot by Baidu, Inc; and Doctor PuJiang (Dr PJ) by Shanghai Artificial Intelligence Laboratory. Results: Dr PJ outperformed ChatGPT and ERNIE Bot in the multiple-turn medical dialogues and case report scenarios. Dr PJ also outperformed ChatGPT in the semantic consistency rate and complete error rate category, indicating better robustness. However, Dr PJ had slightly lower scores in medical professional capabilities compared with ChatGPT in the multiple-turn dialogue scenario. Conclusions: MedGPTEval provides comprehensive criteria to evaluate chatbots by LLMs in the medical domain, open-source data sets, and benchmarks assessing 3 LLMs. Experimental results demonstrate that Dr PJ outperforms ChatGPT and ERNIE Bot in social and professional contexts. Therefore, such an assessment system can be easily adopted by researchers in this community to augment an open-source data set. UR - https://medinform.jmir.org/2024/1/e57674 UR - http://dx.doi.org/10.2196/57674 ID - info:doi/10.2196/57674 ER - TY - JOUR AU - Lahat, Adi AU - Sharif, Kassem AU - Zoabi, Narmin AU - Shneor Patt, Yonatan AU - Sharif, Yousra AU - Fisher, Lior AU - Shani, Uria AU - Arow, Mohamad AU - Levin, Roni AU - Klang, Eyal PY - 2024/6/27 TI - Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4 JO - J Med Internet Res SP - e54571 VL - 26 KW - ChatGPT KW - chat-GPT KW - chatbot KW - chatbots KW - chat-bot KW - chat-bots KW - natural language processing KW - NLP KW - artificial intelligence KW - AI KW - machine learning KW - ML KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - internal medicine KW - ethics KW - ethical KW - ethical dilemma KW - ethical dilemmas KW - bioethics KW - emergency medicine KW - EM medicine KW - ED physician KW - emergency physician KW - emergency doctor N2 - Background: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. Objective: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors? and residents? ratings, and specific question types. Methods: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. Results: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5?s accuracy, beneficial, and completeness dimensions. Conclusions: ChatGPT?s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments. UR - https://www.jmir.org/2024/1/e54571 UR - http://dx.doi.org/10.2196/54571 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54571 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Harada, Yukinori AU - Mizuta, Kazuya AU - Sakamoto, Tetsu AU - Tokumasu, Kazuki AU - Shimizu, Taro PY - 2024/6/26 TI - Evaluating ChatGPT-4?s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases JO - JMIR Form Res SP - e59267 VL - 8 KW - decision support system KW - diagnostic errors KW - diagnostic excellence KW - diagnosis KW - large language model KW - LLM KW - natural language processing KW - GPT-4 KW - ChatGPT KW - diagnoses KW - physicians KW - artificial intelligence KW - AI KW - chatbots KW - medical diagnosis KW - assessment KW - decision-making support KW - application KW - applications KW - app KW - apps N2 - Background: The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists. Objective: This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series. Methods: We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4?s evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician. Results: The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4?s evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1%). The Cohen ? coefficient was 0.63 (95% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians? evaluations. Conclusions: GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process. UR - https://formative.jmir.org/2024/1/e59267 UR - http://dx.doi.org/10.2196/59267 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59267 ER - TY - JOUR AU - Kim, Jin Hong AU - Yang, Hyuk Jae AU - Chang, Dong-Gune AU - Lenke, G. Lawrence AU - Pizones, Javier AU - Castelein, René AU - Watanabe, Kota AU - Trobisch, D. Per AU - Mundis Jr, M. Gregory AU - Suh, Woo Seung AU - Suk, Se-Il PY - 2024/6/26 TI - Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis JO - J Med Internet Res SP - e52001 VL - 26 KW - artificial intelligence KW - AI KW - ChatGPT KW - Bard KW - scientific abstract KW - orthopedic surgery KW - spine KW - journal guidelines KW - plagiarism KW - ethics KW - spine surgery KW - surgery KW - language model KW - chatbot KW - formatting guidelines KW - abstract N2 - Background: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as ?Gemini?; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. Objective: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. Methods: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. Results: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. Conclusions: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard. UR - https://www.jmir.org/2024/1/e52001 UR - http://dx.doi.org/10.2196/52001 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/52001 ER - TY - JOUR AU - Zhu, Lingxuan AU - Mou, Weiming AU - Wu, Keren AU - Lai, Yancheng AU - Lin, Anqi AU - Yang, Tao AU - Zhang, Jian AU - Luo, Peng PY - 2024/6/26 TI - Multimodal ChatGPT-4V for Electrocardiogram Interpretation: Promise and Limitations JO - J Med Internet Res SP - e54607 VL - 26 KW - ChatGPT KW - ECG KW - electrocardiogram KW - multimodal KW - artificial intelligence KW - AI KW - large language model KW - diagnostic KW - quantitative analysis KW - clinical KW - clinicians KW - ECG interpretation KW - cardiovascular care KW - cardiovascular UR - https://www.jmir.org/2024/1/e54607 UR - http://dx.doi.org/10.2196/54607 UR - http://www.ncbi.nlm.nih.gov/pubmed/38764297 ID - info:doi/10.2196/54607 ER - TY - JOUR AU - Ulrich, Sandra AU - Lienhard, Natascha AU - Künzli, Hansjörg AU - Kowatsch, Tobias PY - 2024/6/26 TI - A Chatbot-Delivered Stress Management Coaching for Students (MISHA App): Pilot Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e54945 VL - 12 KW - conversational agent KW - mobile health KW - mHealth KW - smartphone KW - stress management KW - lifestyle KW - behavior change KW - coaching KW - mobile phone N2 - Background: Globally, students face increasing mental health challenges, including elevated stress levels and declining well-being, leading to academic performance issues and mental health disorders. However, due to stigma and symptom underestimation, students rarely seek effective stress management solutions. Conversational agents in the health sector have shown promise in reducing stress, depression, and anxiety. Nevertheless, research on their effectiveness for students with stress remains limited. Objective: This study aims to develop a conversational agent?delivered stress management coaching intervention for students called MISHA and to evaluate its effectiveness, engagement, and acceptance. Methods: In an unblinded randomized controlled trial, Swiss students experiencing stress were recruited on the web. Using a 1:1 randomization ratio, participants (N=140) were allocated to either the intervention or waitlist control group. Treatment effectiveness on changes in the primary outcome, that is, perceived stress, and secondary outcomes, including depression, anxiety, psychosomatic symptoms, and active coping, were self-assessed and evaluated using ANOVA for repeated measure and general estimating equations. Results: The per-protocol analysis revealed evidence for improvement of stress, depression, and somatic symptoms with medium effect sizes (Cohen d=?0.36 to Cohen d=?0.60), while anxiety and active coping did not change (Cohen d=?0.29 and Cohen d=0.13). In the intention-to-treat analysis, similar results were found, indicating reduced stress (? estimate=?0.13, 95% CI ?0.20 to ?0.05; P<.001), depressive symptoms (? estimate=?0.23, 95% CI ?0.38 to ?0.08; P=.003), and psychosomatic symptoms (? estimate=?0.16, 95% CI ?0.27 to ?0.06; P=.003), while anxiety and active coping did not change. Overall, 60% (42/70) of the participants in the intervention group completed the coaching by completing the postintervention survey. They particularly appreciated the quality, quantity, credibility, and visual representation of information. While individual customization was rated the lowest, the target group fitting was perceived as high. Conclusions: Findings indicate that MISHA is feasible, acceptable, and effective in reducing perceived stress among students in Switzerland. Future research is needed with different populations, for example, in students with high stress levels or compared to active controls. Trial Registration: German Clinical Trials Register DRKS 00030004; https://drks.de/search/en/trial/DRKS00030004 UR - https://mhealth.jmir.org/2024/1/e54945 UR - http://dx.doi.org/10.2196/54945 UR - http://www.ncbi.nlm.nih.gov/pubmed/38922677 ID - info:doi/10.2196/54945 ER - TY - JOUR AU - Luo, Xufei AU - Chen, Fengxian AU - Zhu, Di AU - Wang, Ling AU - Wang, Zijun AU - Liu, Hui AU - Lyu, Meng AU - Wang, Ye AU - Wang, Qi AU - Chen, Yaolong PY - 2024/6/25 TI - Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses JO - J Med Internet Res SP - e56780 VL - 26 KW - large language model KW - ChatGPT KW - systematic review KW - chatbot KW - meta-analysis UR - https://www.jmir.org/2024/1/e56780 UR - http://dx.doi.org/10.2196/56780 UR - http://www.ncbi.nlm.nih.gov/pubmed/38819655 ID - info:doi/10.2196/56780 ER - TY - JOUR AU - Shikino, Kiyoshi AU - Shimizu, Taro AU - Otsuka, Yuki AU - Tago, Masaki AU - Takahashi, Hiromizu AU - Watari, Takashi AU - Sasaki, Yosuke AU - Iizuka, Gemmei AU - Tamura, Hiroki AU - Nakashima, Koichi AU - Kunitomo, Kotaro AU - Suzuki, Morika AU - Aoyama, Sayaka AU - Kosaka, Shintaro AU - Kawahigashi, Teiko AU - Matsumoto, Tomohiro AU - Orihara, Fumina AU - Morikawa, Toru AU - Nishizawa, Toshinori AU - Hoshina, Yoji AU - Yamamoto, Yu AU - Matsuo, Yuichiro AU - Unoki, Yuto AU - Kimura, Hirofumi AU - Tokushima, Midori AU - Watanuki, Satoshi AU - Saito, Takuma AU - Otsuka, Fumio AU - Tokuda, Yasuharu PY - 2024/6/21 TI - Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research JO - JMIR Med Educ SP - e58758 VL - 10 KW - atypical presentation KW - ChatGPT KW - common disease KW - diagnostic accuracy KW - diagnosis KW - patient safety N2 - Background: The persistence of diagnostic errors, despite advances in medical knowledge and diagnostics, highlights the importance of understanding atypical disease presentations and their contribution to mortality and morbidity. Artificial intelligence (AI), particularly generative pre-trained transformers like GPT-4, holds promise for improving diagnostic accuracy, but requires further exploration in handling atypical presentations. Objective: This study aimed to assess the diagnostic accuracy of ChatGPT in generating differential diagnoses for atypical presentations of common diseases, with a focus on the model?s reliance on patient history during the diagnostic process. Methods: We used 25 clinical vignettes from the Journal of Generalist Medicine characterizing atypical manifestations of common diseases. Two general medicine physicians categorized the cases based on atypicality. ChatGPT was then used to generate differential diagnoses based on the clinical information provided. The concordance between AI-generated and final diagnoses was measured, with a focus on the top-ranked disease (top 1) and the top 5 differential diagnoses (top 5). Results: ChatGPT?s diagnostic accuracy decreased with an increase in atypical presentation. For category 1 (C1) cases, the concordance rates were 17% (n=1) for the top 1 and 67% (n=4) for the top 5. Categories 3 (C3) and 4 (C4) showed a 0% concordance for top 1 and markedly lower rates for the top 5, indicating difficulties in handling highly atypical cases. The ?2 test revealed no significant difference in the top 1 differential diagnosis accuracy between less atypical (C1+C2) and more atypical (C3+C4) groups (?²1=2.07; n=25; P=.13). However, a significant difference was found in the top 5 analyses, with less atypical cases showing higher accuracy (?²1=4.01; n=25; P=.048). Conclusions: ChatGPT-4 demonstrates potential as an auxiliary tool for diagnosing typical and mildly atypical presentations of common diseases. However, its performance declines with greater atypicality. The study findings underscore the need for AI systems to encompass a broader range of linguistic capabilities, cultural understanding, and diverse clinical scenarios to improve diagnostic utility in real-world settings. UR - https://mededu.jmir.org/2024/1/e58758 UR - http://dx.doi.org/10.2196/58758 ID - info:doi/10.2196/58758 ER - TY - JOUR AU - Chua, Xin Joelle Yan AU - Choolani, Mahesh AU - Chee, Ing Cornelia Yin AU - Yi, Huso AU - Chan, Huak Yiong AU - Lalor, Gabrielle Joan AU - Chong, Seng Yap AU - Shorey, Shefaly PY - 2024/6/21 TI - Parents? Perceptions of Their Parenting Journeys and a Mobile App Intervention (Parentbot?A Digital Healthcare Assistant): Qualitative Process Evaluation JO - J Med Internet Res SP - e56894 VL - 26 KW - perinatal KW - parents KW - mobile app KW - chatbot KW - qualitative study KW - interviews KW - experiences KW - mobile phone N2 - Background: Parents experience many challenges during the perinatal period. Mobile app?based interventions and chatbots show promise in delivering health care support for parents during the perinatal period. Objective: This descriptive qualitative process evaluation study aims to explore the perinatal experiences of parents in Singapore, as well as examine the user experiences of the mobile app?based intervention with an in-built chatbot titled Parentbot?a Digital Healthcare Assistant (PDA). Methods: A total of 20 heterosexual English-speaking parents were recruited via purposive sampling from a single tertiary hospital in Singapore. The parents (control group: 10/20, 50%; intervention group: 10/20, 50%) were also part of an ongoing randomized trial between November 2022 and August 2023 that aimed to evaluate the effectiveness of the PDA in improving parenting outcomes. Semistructured one-to-one interviews were conducted via Zoom from February to June 2023. All interviews were conducted in English, audio recorded, and transcribed verbatim. Data analysis was guided by the thematic analysis framework. The COREQ (Consolidated Criteria for Reporting Qualitative Research) checklist was used to guide the reporting of data. Results: Three themes with 10 subthemes describing parents? perceptions of their parenting journeys and their experiences with the PDA were identified. The main themes were (1) new babies, new troubles, and new wonders; (2) support system for the parents; and (3) reshaping perinatal support for future parents. Conclusions: Overall, the PDA provided parents with informational, socioemotional, and psychological support and could be used to supplement the perinatal care provided for future parents. To optimize users? experience with the PDA, the intervention could be equipped with a more sophisticated chatbot, equipped with more gamification features, and programmed to deliver personalized care to parents. Researchers and health care providers could also strive to promote more peer-to-peer interactions among users. The provision of continuous, holistic, and family-centered care by health care professionals could also be emphasized. Moreover, policy changes regarding maternity and paternity leaves, availability of infant care centers, and flexible work arrangements could be further explored to promote healthy work-family balance for parents. UR - https://www.jmir.org/2024/1/e56894 UR - http://dx.doi.org/10.2196/56894 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56894 ER - TY - JOUR AU - Hartford, Anna AU - Stein, J. Dan PY - 2024/6/18 TI - The Machine Speaks: Conversational AI and the Importance of Effort to Relationships of Meaning JO - JMIR Ment Health SP - e53203 VL - 11 KW - artificial intelligence KW - AI KW - conversational AIs KW - generative AI KW - intimacy KW - human-machine interaction KW - interpersonal relationships KW - effort KW - psychotherapy KW - conversation UR - https://mental.jmir.org/2024/1/e53203 UR - http://dx.doi.org/10.2196/53203 UR - http://www.ncbi.nlm.nih.gov/pubmed/38889401 ID - info:doi/10.2196/53203 ER - TY - JOUR AU - Singla, Ashwani AU - Khanna, Ritvik AU - Kaur, Manpreet AU - Kelm, Karen AU - Zaiane, Osmar AU - Rosenfelt, Scott Cory AU - Bui, An Truong AU - Rezaei, Navid AU - Nicholas, David AU - Reformat, Z. Marek AU - Majnemer, Annette AU - Ogourtsova, Tatiana AU - Bolduc, Francois PY - 2024/6/18 TI - Developing a Chatbot to Support Individuals With Neurodevelopmental Disorders: Tutorial JO - J Med Internet Res SP - e50182 VL - 26 KW - chatbot KW - user interface KW - knowledge graph KW - neurodevelopmental disability KW - autism KW - intellectual disability KW - attention-deficit/hyperactivity disorder UR - https://www.jmir.org/2024/1/e50182 UR - http://dx.doi.org/10.2196/50182 UR - http://www.ncbi.nlm.nih.gov/pubmed/38888947 ID - info:doi/10.2196/50182 ER - TY - JOUR AU - Collins, Luke AU - Nicholson, Niamh AU - Lidbetter, Nicky AU - Smithson, Dave AU - Baker, Paul PY - 2024/6/17 TI - Implementation of Anxiety UK?s Ask Anxia Chatbot Service: Lessons Learned JO - JMIR Hum Factors SP - e53897 VL - 11 KW - chatbots KW - anxiety disorders KW - corpus linguistics KW - conversational agents KW - web-based care UR - https://humanfactors.jmir.org/2024/1/e53897 UR - http://dx.doi.org/10.2196/53897 UR - http://www.ncbi.nlm.nih.gov/pubmed/38885016 ID - info:doi/10.2196/53897 ER - TY - JOUR AU - Masanneck, Lars AU - Schmidt, Linea AU - Seifert, Antonia AU - Kölsche, Tristan AU - Huntemann, Niklas AU - Jansen, Robin AU - Mehsin, Mohammed AU - Bernhard, Michael AU - Meuth, G. Sven AU - Böhm, Lennert AU - Pawlitzki, Marc PY - 2024/6/14 TI - Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study JO - J Med Internet Res SP - e53297 VL - 26 KW - emergency medicine KW - triage KW - artificial intelligence KW - large language models KW - ChatGPT KW - untrained doctors KW - doctor KW - doctors KW - comparative study KW - digital health KW - personnel KW - staff KW - cohort KW - Germany KW - German N2 - Background: Large language models (LLMs) have demonstrated impressive performances in various medical domains, prompting an exploration of their potential utility within the high-demand setting of emergency department (ED) triage. This study evaluated the triage proficiency of different LLMs and ChatGPT, an LLM-based chatbot, compared to professionally trained ED staff and untrained personnel. We further explored whether LLM responses could guide untrained staff in effective triage. Objective: This study aimed to assess the efficacy of LLMs and the associated product ChatGPT in ED triage compared to personnel of varying training status and to investigate if the models? responses can enhance the triage proficiency of untrained personnel. Methods: A total of 124 anonymized case vignettes were triaged by untrained doctors; different versions of currently available LLMs; ChatGPT; and professionally trained raters, who subsequently agreed on a consensus set according to the Manchester Triage System (MTS). The prototypical vignettes were adapted from cases at a tertiary ED in Germany. The main outcome was the level of agreement between raters? MTS level assignments, measured via quadratic-weighted Cohen ?. The extent of over- and undertriage was also determined. Notably, instances of ChatGPT were prompted using zero-shot approaches without extensive background information on the MTS. The tested LLMs included raw GPT-4, Llama 3 70B, Gemini 1.5, and Mixtral 8x7b. Results: GPT-4?based ChatGPT and untrained doctors showed substantial agreement with the consensus triage of professional raters (?=mean 0.67, SD 0.037 and ?=mean 0.68, SD 0.056, respectively), significantly exceeding the performance of GPT-3.5?based ChatGPT (?=mean 0.54, SD 0.024; P<.001). When untrained doctors used this LLM for second-opinion triage, there was a slight but statistically insignificant performance increase (?=mean 0.70, SD 0.047; P=.97). Other tested LLMs performed similar to or worse than GPT-4?based ChatGPT or showed odd triaging behavior with the used parameters. LLMs and ChatGPT models tended toward overtriage, whereas untrained doctors undertriaged. Conclusions: While LLMs and the LLM-based product ChatGPT do not yet match professionally trained raters, their best models? triage proficiency equals that of untrained ED doctors. In its current form, LLMs or ChatGPT thus did not demonstrate gold-standard performance in ED triage and, in the setting of this study, failed to significantly improve untrained doctors? triage when used as decision support. Notable performance enhancements in newer LLM versions over older ones hint at future improvements with further technological development and specific training. UR - https://www.jmir.org/2024/1/e53297 UR - http://dx.doi.org/10.2196/53297 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875696 ID - info:doi/10.2196/53297 ER - TY - JOUR AU - Zhang, Fang AU - Liu, Xiaoliu AU - Wu, Wenyan AU - Zhu, Shiben PY - 2024/6/13 TI - Evolution of Chatbots in Nursing Education: Narrative Review JO - JMIR Med Educ SP - e54987 VL - 10 KW - nursing education KW - chatbots KW - artificial intelligence KW - narrative review KW - ChatGPT N2 - Background: The integration of chatbots in nursing education is a rapidly evolving area with potential transformative impacts. This narrative review aims to synthesize and analyze the existing literature on chatbots in nursing education. Objective: This study aims to comprehensively examine the temporal trends, international distribution, study designs, and implications of chatbots in nursing education. Methods: A comprehensive search was conducted across 3 databases (PubMed, Web of Science, and Embase) following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. Results: A total of 40 articles met the eligibility criteria, with a notable increase of publications in 2023 (n=28, 70%). Temporal analysis revealed a notable surge in publications from 2021 to 2023, emphasizing the growing scholarly interest. Geographically, Taiwan province made substantial contributions (n=8, 20%), followed by the United States (n=6, 15%) and South Korea (n=4, 10%). Study designs varied, with reviews (n=8, 20%) and editorials (n=7, 18%) being predominant, showcasing the richness of research in this domain. Conclusions: Integrating chatbots into nursing education presents a promising yet relatively unexplored avenue. This review highlights the urgent need for original research, emphasizing the importance of ethical considerations. UR - https://mededu.jmir.org/2024/1/e54987 UR - http://dx.doi.org/10.2196/54987 ID - info:doi/10.2196/54987 ER - TY - JOUR AU - Srinivasan, Muthuvenkatachalam AU - Venugopal, Ambili AU - Venkatesan, Latha AU - Kumar, Rajesh PY - 2024/6/13 TI - Navigating the Pedagogical Landscape: Exploring the Implications of AI and Chatbots in Nursing Education JO - JMIR Nursing SP - e52105 VL - 7 KW - AI KW - artificial intelligence KW - ChatGPT KW - chatbots KW - nursing education KW - education KW - chatbot KW - nursing KW - ethical KW - ethics KW - ethical consideration KW - accessible KW - learning KW - efficiency KW - student KW - student engagement KW - student learning UR - https://nursing.jmir.org/2024/1/e52105 UR - http://dx.doi.org/10.2196/52105 UR - http://www.ncbi.nlm.nih.gov/pubmed/38870516 ID - info:doi/10.2196/52105 ER - TY - JOUR AU - Rubin, Matan AU - Arnon, Hadar AU - Huppert, D. Jonathan AU - Perry, Anat PY - 2024/6/11 TI - Considering the Role of Human Empathy in AI-Driven Therapy JO - JMIR Ment Health SP - e56529 VL - 11 KW - empathy KW - empathetic KW - empathic KW - artificial empathy KW - AI KW - artificial intelligence KW - mental health KW - machine learning KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - model KW - models KW - therapy KW - mental illness KW - mental illnesses KW - mental disease KW - mental diseases KW - mood disorder KW - mood disorders KW - emotion KW - emotions KW - e-mental health KW - digital mental health KW - internet-based therapy UR - https://mental.jmir.org/2024/1/e56529 UR - http://dx.doi.org/10.2196/56529 UR - http://www.ncbi.nlm.nih.gov/pubmed/38861302 ID - info:doi/10.2196/56529 ER - TY - JOUR AU - Sekhar, C. Tejas AU - Nayak, R. Yash AU - Abdoler, A. Emily PY - 2024/6/7 TI - A Use Case for Generative AI in Medical Education JO - JMIR Med Educ SP - e56117 VL - 10 KW - medical education KW - med ed KW - generative artificial intelligence KW - artificial intelligence KW - GAI KW - AI KW - Anki KW - flashcard KW - undergraduate medical education KW - UME UR - https://mededu.jmir.org/2024/1/e56117 UR - http://dx.doi.org/10.2196/56117 ID - info:doi/10.2196/56117 ER - TY - JOUR AU - Pendergrast, Tricia AU - Chalmers, Zachary PY - 2024/6/7 TI - Authors? Reply: A Use Case for Generative AI in Medical Education JO - JMIR Med Educ SP - e58370 VL - 10 KW - ChatGPT KW - undergraduate medical education KW - large language models UR - https://mededu.jmir.org/2024/1/e58370 UR - http://dx.doi.org/10.2196/58370 ID - info:doi/10.2196/58370 ER - TY - JOUR AU - Dagli, Marcel Mert AU - Oettl, Conrad Felix AU - Gujral, Jaskeerat AU - Malhotra, Kashish AU - Ghenbot, Yohannes AU - Yoon, W. Jang AU - Ozturk, K. Ali AU - Welch, C. William PY - 2024/6/7 TI - Clinical Accuracy, Relevance, Clarity, and Emotional Sensitivity of Large Language Models to Surgical Patient Questions: Cross-Sectional Study JO - JMIR Form Res SP - e56165 VL - 8 KW - artificial intelligence KW - AI KW - natural language processing KW - NLP KW - large language model KW - LLM KW - generative AI KW - cross-sectional study KW - health information KW - patient education KW - clinical accuracy KW - emotional sensitivity KW - surgical patient KW - surgery KW - surgical UR - https://formative.jmir.org/2024/1/e56165 UR - http://dx.doi.org/10.2196/56165 UR - http://www.ncbi.nlm.nih.gov/pubmed/38848553 ID - info:doi/10.2196/56165 ER - TY - JOUR AU - Meczner, András AU - Cohen, Nathan AU - Qureshi, Aleem AU - Reza, Maria AU - Sutaria, Shailen AU - Blount, Emily AU - Bagyura, Zsolt AU - Malak, Tamer PY - 2024/5/31 TI - Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics JO - JMIR Form Res SP - e49907 VL - 8 KW - symptom checker KW - accuracy KW - vignette studies KW - variability KW - methods KW - triage KW - evaluation KW - vignette KW - performance KW - metrics KW - mobile phone N2 - Background: The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs? performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability. Objective: This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance. Methods: Healthily?s SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). ? statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs. Results: Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9% to 57% for individual testers, averaging 50.6% (SD 5.35%). Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9% and 68%. Conclusions: We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics. UR - https://formative.jmir.org/2024/1/e49907 UR - http://dx.doi.org/10.2196/49907 UR - http://www.ncbi.nlm.nih.gov/pubmed/38820578 ID - info:doi/10.2196/49907 ER - TY - JOUR AU - Yoon, Dukyong AU - Han, Changho AU - Kim, Won Dong AU - Kim, Songsoo AU - Bae, SungA AU - Ryu, An Jee AU - Choi, Yujin PY - 2024/5/31 TI - Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange JO - J Med Internet Res SP - e56614 VL - 26 KW - health care interoperability KW - large language models KW - medical data transformation KW - data standardization KW - text-based N2 - Background: Efficient data exchange and health care interoperability are impeded by medical records often being in nonstandardized or unstructured natural language format. Advanced language models, such as large language models (LLMs), may help overcome current challenges in information exchange. Objective: This study aims to evaluate the capability of LLMs in transforming and transferring health care data to support interoperability. Methods: Using data from the Medical Information Mart for Intensive Care III and UK Biobank, the study conducted 3 experiments. Experiment 1 assessed the accuracy of transforming structured laboratory results into unstructured format. Experiment 2 explored the conversion of diagnostic codes between the coding frameworks of the ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), and Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) using a traditional mapping table and a text-based approach facilitated by the LLM ChatGPT. Experiment 3 focused on extracting targeted information from unstructured records that included comprehensive clinical information (discharge notes). Results: The text-based approach showed a high conversion accuracy in transforming laboratory results (experiment 1) and an enhanced consistency in diagnostic code conversion, particularly for frequently used diagnostic names, compared with the traditional mapping approach (experiment 2). In experiment 3, the LLM showed a positive predictive value of 87.2% in extracting generic drug names. Conclusions: This study highlighted the potential role of LLMs in significantly improving health care data interoperability, demonstrated by their high accuracy and efficiency in data transformation and exchange. The LLMs hold vast potential for enhancing medical data exchange without complex standardization for medical terms and data structure. UR - https://www.jmir.org/2024/1/e56614 UR - http://dx.doi.org/10.2196/56614 UR - http://www.ncbi.nlm.nih.gov/pubmed/38819879 ID - info:doi/10.2196/56614 ER - TY - JOUR AU - MacNeill, Luke A. AU - Doucet, Shelley AU - Luke, Alison PY - 2024/5/30 TI - Effectiveness of a Mental Health Chatbot for People With Chronic Diseases: Randomized Controlled Trial JO - JMIR Form Res SP - e50025 VL - 8 KW - chatbot KW - chronic disease KW - arthritis KW - diabetes KW - mental health KW - depression KW - anxiety KW - stress KW - effectiveness KW - application N2 - Background: People with chronic diseases tend to experience more mental health issues than their peers without these health conditions. Mental health chatbots offer a potential source of mental health support for people with chronic diseases. Objective: The aim of this study was to determine whether a mental health chatbot can improve mental health in people with chronic diseases. We focused on 2 chronic diseases in particular: arthritis and diabetes. Methods: Individuals with arthritis or diabetes were recruited using various web-based methods. Participants were randomly assigned to 1 of 2 groups. Those in the treatment group used a mental health chatbot app (Wysa [Wysa Inc]) over a period of 4 weeks. Those in the control group received no intervention. Participants completed measures of depression (Patient Health Questionnaire?9), anxiety (Generalized Anxiety Disorder Scale?7), and stress (Perceived Stress Scale?10) at baseline, with follow-up testing 2 and 4 weeks later. Participants in the treatment group completed feedback questions on their experiences with the app at the final assessment point. Results: A total of 68 participants (n=47, 69% women; mean age 42.87, SD 11.27 years) were included in the analysis. Participants were divided evenly between the treatment and control groups. Those in the treatment group reported decreases in depression (P<.001) and anxiety (P<.001) severity over the study period. No such changes were found among participants in the control group. No changes in stress were reported by participants in either group. Participants with arthritis reported higher levels of depression (P=.004) and anxiety (P=.004) severity than participants with diabetes over the course of the study, as well as higher levels of stress (P=.01); otherwise, patterns of results were similar across these health conditions. In response to the feedback questions, participants in the treatment group said that they liked many of the functions and features of the app, the general design of the app, and the user experience. They also disliked some aspects of the app, with most of these reports focusing on the chatbot?s conversational abilities. Conclusions: The results of this study suggest that mental health chatbots can be an effective source of mental health support for people with chronic diseases such as arthritis and diabetes. Although cost-effective and accessible, these programs have limitations and may not be well suited for all individuals. Trial Registration: ClinicalTrials.gov NCT04620668; https://www.clinicaltrials.gov/study/NCT04620668 UR - https://formative.jmir.org/2024/1/e50025 UR - http://dx.doi.org/10.2196/50025 UR - http://www.ncbi.nlm.nih.gov/pubmed/38814681 ID - info:doi/10.2196/50025 ER - TY - JOUR AU - Schillings, Christine AU - Meißner, Echo AU - Erb, Benjamin AU - Bendig, Eileen AU - Schultchen, Dana AU - Pollatos, Olga PY - 2024/5/28 TI - Effects of a Chatbot-Based Intervention on Stress and Health-Related Parameters in a Stressed Sample: Randomized Controlled Trial JO - JMIR Ment Health SP - e50454 VL - 11 KW - chatbot KW - intervention KW - stress KW - interoception KW - interoceptive sensibility KW - mindfulness KW - emotion regulation KW - RCT KW - randomized controlled trial N2 - Background: Stress levels and the prevalence of mental disorders in the general population have been rising in recent years. Chatbot-based interventions represent novel and promising digital approaches to improve health-related parameters. However, there is a lack of research on chatbot-based interventions in the area of mental health. Objective: The aim of this study was to investigate the effects of a 3-week chatbot-based intervention guided by the chatbot ELME, specifically with respect to the ability to reduce stress and improve various health-related parameters in a stressed sample. Methods: In this multicenter two-armed randomized controlled trial, 118 individuals with medium to high stress levels were randomized to the intervention group (n=59) or the treatment-as-usual control group (n=59). The ELME chatbot guided participants of the intervention group through 3 weeks of training based on the topics stress, mindfulness, and interoception, with practical and psychoeducative elements delivered in two daily interactive intervention sessions via a smartphone (approximately 10-20 minutes each). The primary outcome (perceived stress) and secondary outcomes (mindfulness; interoception or interoceptive sensibility; subjective well-being; and emotion regulation, including the subfacets reappraisal and suppression) were assessed preintervention (T1), post intervention (T2; after 3 weeks), and at follow-up (T3; after 6 weeks). During both conditions, participants also underwent ecological momentary assessments of stress and interoceptive sensibility. Results: There were no significant changes in perceived stress (?03=?.018, SE=.329; P=.96) and momentary stress. Mindfulness and the subfacet reappraisal significantly increased in the intervention group over time, whereas there was no change in the subfacet suppression. Well-being and momentary interoceptive sensibility increased in both groups over time. Conclusions: To gain insight into how the intervention can be improved to achieve its full potential for stress reduction, besides a longer intervention duration, specific sample subgroups should be considered. The chatbot-based intervention seems to have the potential to improve mindfulness and emotion regulation in a stressed sample. Future chatbot-based studies and interventions in health care should be designed based on the latest findings on the efficacy of rule-based and artificial intelligence?based chatbots. Trial Registration: German Clinical Trials Register DRKS00027560; https://drks.de/search/en/trial/DRKS00027560 International Registered Report Identifier (IRRID): RR2-doi.org/10.3389/fdgth.2023.1046202 UR - https://mental.jmir.org/2024/1/e50454 UR - http://dx.doi.org/10.2196/50454 UR - http://www.ncbi.nlm.nih.gov/pubmed/38805259 ID - info:doi/10.2196/50454 ER - TY - JOUR AU - Choudhury, Avishek AU - Shamszare, Hamid PY - 2024/5/27 TI - The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis JO - JMIR Hum Factors SP - e55399 VL - 11 KW - ChatGPT KW - chatbots KW - health care KW - health care decision-making KW - health-related decision-making KW - health care management KW - decision-making KW - user perception KW - usability KW - usable KW - usableness KW - usefulness KW - artificial intelligence KW - algorithms KW - predictive models KW - predictive analytics KW - predictive system KW - practical models KW - deep learning KW - cross-sectional survey N2 - Background: ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users? perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology. Objective: The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users? trust in ChatGPT. Methods: A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis. Results: A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor?s degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust. Conclusions: The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots. UR - https://humanfactors.jmir.org/2024/1/e55399 UR - http://dx.doi.org/10.2196/55399 UR - http://www.ncbi.nlm.nih.gov/pubmed/38801658 ID - info:doi/10.2196/55399 ER - TY - JOUR AU - Wang, Anan AU - Wu, Yunong AU - Ji, Xiaojian AU - Wang, Xiangyang AU - Hu, Jiawen AU - Zhang, Fazhan AU - Zhang, Zhanchao AU - Pu, Dong AU - Tang, Lulu AU - Ma, Shikui AU - Liu, Qiang AU - Dong, Jing AU - He, Kunlun AU - Li, Kunpeng AU - Teng, Da AU - Li, Tao PY - 2024/5/24 TI - Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment JO - JMIR Res Protoc SP - e57001 VL - 13 KW - spondyloarthritis KW - benchmark KW - large language model KW - artificial intelligence KW - AI KW - AI chatbot KW - AI-assistant diagnosis N2 - Background: Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA?s complexity, as evidenced by its diverse clinical presentations and symptoms that often mimic other diseases, presents substantial challenges in its accurate diagnosis and differentiation. This complexity becomes even more pronounced in nonspecialist health care environments due to limited resources, resulting in delayed referrals, increased misdiagnosis rates, and exacerbated disability outcomes for patients with SpA. The emergence of large language models (LLMs) in medical diagnostics introduces a revolutionary potential to overcome these diagnostic hurdles. Despite recent advancements in artificial intelligence and LLMs demonstrating effectiveness in diagnosing and treating various diseases, their application in SpA remains underdeveloped. Currently, there is a notable absence of SpA-specific LLMs and an established benchmark for assessing the performance of such models in this particular field. Objective: Our objective is to develop a foundational medical model, creating a comprehensive evaluation benchmark tailored to the essential medical knowledge of SpA and its unique diagnostic and treatment protocols. The model, post-pretraining, will be subject to further enhancement through supervised fine-tuning. It is projected to significantly aid physicians in SpA diagnosis and treatment, especially in settings with limited access to specialized care. Furthermore, this initiative is poised to promote early and accurate SpA detection at the primary care level, thereby diminishing the risks associated with delayed or incorrect diagnoses. Methods: A rigorous benchmark, comprising 222 meticulously formulated multiple-choice questions on SpA, will be established and developed. These questions will be extensively revised to ensure their suitability for accurately evaluating LLMs? performance in real-world diagnostic and therapeutic scenarios. Our methodology involves selecting and refining top foundational models using public data sets. The best-performing model in our benchmark will undergo further training. Subsequently, more than 80,000 real-world inpatient and outpatient cases from hospitals will enhance LLM training, incorporating techniques such as supervised fine-tuning and low-rank adaptation. We will rigorously assess the models? generated responses for accuracy and evaluate their reasoning processes using the metrics of fluency, relevance, completeness, and medical proficiency. Results: Development of the model is progressing, with significant enhancements anticipated by early 2024. The benchmark, along with the results of evaluations, is expected to be released in the second quarter of 2024. Conclusions: Our trained model aims to capitalize on the capabilities of LLMs in analyzing complex clinical data, thereby enabling precise detection, diagnosis, and treatment of SpA. This innovation is anticipated to play a vital role in diminishing the disabilities arising from delayed or incorrect SpA diagnoses. By promoting this model across diverse health care settings, we anticipate a significant improvement in SpA management, culminating in enhanced patient outcomes and a reduced overall burden of the disease. International Registered Report Identifier (IRRID): DERR1-10.2196/57001 UR - https://www.researchprotocols.org/2024/1/e57001 UR - http://dx.doi.org/10.2196/57001 UR - http://www.ncbi.nlm.nih.gov/pubmed/38788208 ID - info:doi/10.2196/57001 ER - TY - JOUR AU - Takagi, Soshi AU - Koda, Masahide AU - Watari, Takashi PY - 2024/5/23 TI - The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam JO - JMIR Med Educ SP - e54283 VL - 10 KW - ChatGPT KW - medical licensing examination KW - generative artificial intelligence KW - medical education KW - large language model KW - images KW - tables KW - artificial intelligence KW - AI KW - Japanese KW - reliability KW - medical application KW - medical applications KW - diagnostic KW - diagnostics KW - online data KW - web-based data UR - https://mededu.jmir.org/2024/1/e54283 UR - http://dx.doi.org/10.2196/54283 ID - info:doi/10.2196/54283 ER - TY - JOUR AU - Chelli, Mikaël AU - Descamps, Jules AU - Lavoué, Vincent AU - Trojani, Christophe AU - Azar, Michel AU - Deckert, Marcel AU - Raynier, Jean-Luc AU - Clowez, Gilles AU - Boileau, Pascal AU - Ruetsch-Chelli, Caroline PY - 2024/5/22 TI - Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis JO - J Med Internet Res SP - e53164 VL - 26 KW - artificial intelligence KW - large language models KW - ChatGPT KW - Bard KW - rotator cuff KW - systematic reviews KW - literature search KW - hallucinated KW - human conducted N2 - Background: Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist. Objective: The aim of the study is to assess the performance of LLMs such as ChatGPT and Bard (subsequently rebranded Gemini) to produce references in the context of scientific writing. Methods: The performance of ChatGPT and Bard in replicating the results of human-conducted systematic reviews was assessed. Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards. The study used 3 key performance metrics: recall, precision, and F1-score, alongside the hallucination rate. Papers were considered ?hallucinated? if any 2 of the following information were wrong: title, first author, or year of publication. Results: In total, 11 systematic reviews across 4 fields yielded 33 prompts to LLMs (3 LLMs×11 reviews), with 471 references analyzed. Precision rates for GPT-3.5, GPT-4, and Bard were 9.4% (13/139), 13.4% (16/119), and 0% (0/104) respectively (P<.001). Recall rates were 11.9% (13/109) for GPT-3.5 and 13.7% (15/109) for GPT-4, with Bard failing to retrieve any relevant papers (P<.001). Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard (P<.001). Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs. Conclusions: Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes. UR - https://www.jmir.org/2024/1/e53164 UR - http://dx.doi.org/10.2196/53164 UR - http://www.ncbi.nlm.nih.gov/pubmed/38776130 ID - info:doi/10.2196/53164 ER - TY - JOUR AU - Xue, Elisabetta AU - Bracken-Clarke, Dara AU - Iannantuono, Maria Giovanni AU - Choo-Wosoba, Hyoyoung AU - Gulley, L. James AU - Floudas, S. Charalampos PY - 2024/5/17 TI - Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard JO - J Med Internet Res SP - e54758 VL - 26 KW - hematopoietic stem cell transplant KW - large language models KW - chatbot KW - chatbots KW - stem cell KW - large language model KW - artificial intelligence KW - AI KW - medical information KW - hematopoietic KW - HSCT KW - ChatGPT N2 - Background: Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend. Objective: We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT. Methods: We submitted 72 open-ended HSCT?related questions of variable difficulty to the LLMs and rated their responses based on consistency?defined as replicability of the response?response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience. Results: ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader. Conclusions: In conclusion, despite LLMs? potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs? ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application. UR - https://www.jmir.org/2024/1/e54758 UR - http://dx.doi.org/10.2196/54758 UR - http://www.ncbi.nlm.nih.gov/pubmed/38758582 ID - info:doi/10.2196/54758 ER - TY - JOUR AU - Lambert, Raphaella AU - Choo, Zi-Yi AU - Gradwohl, Kelsey AU - Schroedl, Liesl AU - Ruiz De Luzuriaga, Arlene PY - 2024/5/16 TI - Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study JO - JMIR Dermatol SP - e55898 VL - 7 KW - artificial intelligence KW - large language models KW - large language model KW - LLM KW - LLMs KW - machine learning KW - natural language processing KW - deep learning KW - ChatGPT KW - health literacy KW - health knowledge KW - health information KW - patient education KW - dermatology KW - dermatologist KW - dermatologists KW - derm KW - dermatology resident KW - dermatology residents KW - dermatologic patient education material KW - dermatologic patient education materials KW - patient education material KW - patient education materials KW - education material KW - education materials N2 - Background: Dermatologic patient education materials (PEMs) are often written above the national average seventh- to eighth-grade reading level. ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT are large language models (LLMs) that are responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels. Objective: This study aims to assess the ability of select LLMs to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, the study aims to assess the preservation of meaning across such LLM-generated PEMs, as assessed by dermatology resident trainees. Methods: The Flesch-Kincaid reading level (FKRL) of current American Academy of Dermatology PEMs was evaluated for 4 common (atopic dermatitis, acne vulgaris, psoriasis, and herpes zoster) and 4 rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, and lichen planus) dermatologic conditions. We prompted ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT to ?Create a patient education handout about [condition] at a [FKRL]? to iteratively generate 10 PEMs per condition at unspecified fifth- and seventh-grade FKRLs, evaluated with Microsoft Word readability statistics. The preservation of meaning across LLMs was assessed by 2 dermatology resident trainees. Results: The current American Academy of Dermatology PEMs had an average (SD) FKRL of 9.35 (1.26) and 9.50 (2.3) for common and rare diseases, respectively. For common diseases, the FKRLs of LLM-produced PEMs ranged between 9.8 and 11.21 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). For rare diseases, the FKRLs of LLM-produced PEMs ranged between 9.85 and 11.45 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). At the fifth-grade reading level, GPT-4 was better at producing PEMs for both common and rare conditions than ChatGPT-3.5 (P=.001 and P=.01, respectively), DermGPT (P<.001 and P=.03, respectively), and DocsGPT (P<.001 and P=.02, respectively). At the seventh-grade reading level, no significant difference was found between ChatGPT-3.5, GPT-4, DocsGPT, or DermGPT in producing PEMs for common conditions (all P>.05); however, for rare conditions, ChatGPT-3.5 and DocsGPT outperformed GPT-4 (P=.003 and P<.001, respectively). The preservation of meaning analysis revealed that for common conditions, DermGPT ranked the highest for overall ease of reading, patient understandability, and accuracy (14.75/15, 98%); for rare conditions, handouts generated by GPT-4 ranked the highest (14.5/15, 97%). Conclusions: GPT-4 appeared to outperform ChatGPT-3.5, DocsGPT, and DermGPT at the fifth-grade FKRL for both common and rare conditions, although both ChatGPT-3.5 and DocsGPT performed better than GPT-4 at the seventh-grade FKRL for rare conditions. LLM-produced PEMs may reliably meet seventh-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients, and mostly accurate. LLMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology. UR - https://derma.jmir.org/2024/1/e55898 UR - http://dx.doi.org/10.2196/55898 UR - http://www.ncbi.nlm.nih.gov/pubmed/38754096 ID - info:doi/10.2196/55898 ER - TY - JOUR AU - Gwon, Nam Yong AU - Kim, Heon Jae AU - Chung, Soo Hyun AU - Jung, Jee Eun AU - Chun, Joey AU - Lee, Serin AU - Shim, Ryul Sung PY - 2024/5/14 TI - The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation JO - JMIR Med Inform SP - e51187 VL - 12 KW - artificial intelligence KW - search engine KW - systematic review KW - evidence-based medicine KW - ChatGPT KW - language model KW - education KW - tool KW - clinical decision support system KW - decision support KW - support KW - treatment N2 - Background: A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development. Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings. Methods: The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer. Results: From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies. Conclusions: This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user?s point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly. UR - https://medinform.jmir.org/2024/1/e51187 UR - http://dx.doi.org/10.2196/51187 ID - info:doi/10.2196/51187 ER - TY - JOUR AU - Chiu, Keith Wan Hang AU - Ko, Koel Wei Sum AU - Cho, Shing William Chi AU - Hui, Joanne Sin Yu AU - Chan, Lawrence Wing Chi AU - Kuo, D. Michael PY - 2024/5/13 TI - Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases JO - J Med Internet Res SP - e53724 VL - 26 KW - large language model KW - hospital KW - health center KW - Massachusetts KW - statistical analysis KW - chi-square KW - ANOVA KW - clinician KW - physician KW - performance KW - proficiency KW - disease etiology UR - https://www.jmir.org/2024/1/e53724 UR - http://dx.doi.org/10.2196/53724 UR - http://www.ncbi.nlm.nih.gov/pubmed/38739441 ID - info:doi/10.2196/53724 ER - TY - JOUR AU - Denecke, Kerstin AU - May, Richard AU - AU - Rivera Romero, Octavio PY - 2024/5/13 TI - Potential of Large Language Models in Health Care: Delphi Study JO - J Med Internet Res SP - e52399 VL - 26 KW - large language models KW - LLMs KW - health care KW - Delphi study KW - natural language processing KW - NLP KW - artificial intelligence KW - language model KW - Delphi KW - future KW - innovation KW - interview KW - interviews KW - informatics KW - experience KW - experiences KW - attitude KW - attitudes KW - opinion KW - perception KW - perceptions KW - perspective KW - perspectives KW - implementation N2 - Background: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective: The aim of this adapted Delphi study was to collect researchers? opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice. UR - https://www.jmir.org/2024/1/e52399 UR - http://dx.doi.org/10.2196/52399 UR - http://www.ncbi.nlm.nih.gov/pubmed/38739445 ID - info:doi/10.2196/52399 ER - TY - JOUR AU - Preiksaitis, Carl AU - Ashenburg, Nicholas AU - Bunney, Gabrielle AU - Chu, Andrew AU - Kabeer, Rana AU - Riley, Fran AU - Ribeira, Ryan AU - Rose, Christian PY - 2024/5/10 TI - The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review JO - JMIR Med Inform SP - e53787 VL - 12 KW - large language model KW - LLM KW - emergency medicine KW - clinical decision support KW - workflow efficiency KW - medical education KW - artificial intelligence KW - AI KW - natural language processing KW - NLP KW - AI literacy KW - ChatGPT KW - Bard KW - Pathways Language Model KW - Med-PaLM KW - Bidirectional Encoder Representations from Transformers KW - BERT KW - generative pretrained transformer KW - GPT KW - United States KW - US KW - China KW - scoping review KW - Preferred Reporting Items for Systematic Reviews and Meta-Analyses KW - PRISMA KW - decision support KW - risk KW - ethics KW - education KW - communication KW - medical training KW - physician KW - health literacy KW - emergency care N2 - Background: Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM. Objective: Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs? potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field. Methods: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs? use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data. Results: A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs? outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs? capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills. Conclusions: LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians? AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied. UR - https://medinform.jmir.org/2024/1/e53787 UR - http://dx.doi.org/10.2196/53787 UR - http://www.ncbi.nlm.nih.gov/pubmed/38728687 ID - info:doi/10.2196/53787 ER - TY - JOUR AU - Skryd, Anthony AU - Lawrence, Katharine PY - 2024/5/8 TI - ChatGPT as a Tool for Medical Education and Clinical Decision-Making on the Wards: Case Study JO - JMIR Form Res SP - e51346 VL - 8 KW - ChatGPT KW - medical education KW - large language models KW - LLMs KW - clinical decision-making N2 - Background: Large language models (LLMs) are computational artificial intelligence systems with advanced natural language processing capabilities that have recently been popularized among health care students and educators due to their ability to provide real-time access to a vast amount of medical knowledge. The adoption of LLM technology into medical education and training has varied, and little empirical evidence exists to support its use in clinical teaching environments. Objective: The aim of the study is to identify and qualitatively evaluate potential use cases and limitations of LLM technology for real-time ward-based educational contexts. Methods: A brief, single-site exploratory evaluation of the publicly available ChatGPT-3.5 (OpenAI) was conducted by implementing the tool into the daily attending rounds of a general internal medicine inpatient service at a large urban academic medical center. ChatGPT was integrated into rounds via both structured and organic use, using the web-based ?chatbot? style interface to interact with the LLM through conversational free-text and discrete queries. A qualitative approach using phenomenological inquiry was used to identify key insights related to the use of ChatGPT through analysis of ChatGPT conversation logs and associated shorthand notes from the clinical sessions. Results: Identified use cases for ChatGPT integration included addressing medical knowledge gaps through discrete medical knowledge inquiries, building differential diagnoses and engaging dual-process thinking, challenging medical axioms, using cognitive aids to support acute care decision-making, and improving complex care management by facilitating conversations with subspecialties. Potential additional uses included engaging in difficult conversations with patients, exploring ethical challenges and general medical ethics teaching, personal continuing medical education resources, developing ward-based teaching tools, supporting and automating clinical documentation, and supporting productivity and task management. LLM biases, misinformation, ethics, and health equity were identified as areas of concern and potential limitations to clinical and training use. A code of conduct on ethical and appropriate use was also developed to guide team usage on the wards. Conclusions: Overall, ChatGPT offers a novel tool to enhance ward-based learning through rapid information querying, second-order content exploration, and engaged team discussion regarding generated responses. More research is needed to fully understand contexts for educational use, particularly regarding the risks and limitations of the tool in clinical settings and its impacts on trainee development. UR - https://formative.jmir.org/2024/1/e51346 UR - http://dx.doi.org/10.2196/51346 UR - http://www.ncbi.nlm.nih.gov/pubmed/38717811 ID - info:doi/10.2196/51346 ER - TY - JOUR AU - Hacking, Sean PY - 2024/5/7 TI - ChatGPT and Medicine: Together We Embrace the AI Renaissance JO - JMIR Bioinform Biotech SP - e52700 VL - 5 KW - ChatGPT KW - generative AI KW - NLP KW - medicine KW - bioinformatics KW - AI democratization KW - AI renaissance KW - artificial intelligence KW - natural language processing UR - https://bioinform.jmir.org/2024/1/e52700 UR - http://dx.doi.org/10.2196/52700 UR - http://www.ncbi.nlm.nih.gov/pubmed/38935938 ID - info:doi/10.2196/52700 ER - TY - JOUR AU - Chew, Jocelyn Han Shi AU - Chew, WS Nicholas AU - Loong, Ern Shaun Seh AU - Lim, Lin Su AU - Tam, Wilson Wai San AU - Chin, Han Yip AU - Chao, M. Ariana AU - Dimitriadis, K. Georgios AU - Gao, Yujia AU - So, Yan Jimmy Bok AU - Shabbir, Asim AU - Ngiam, Yuan Kee PY - 2024/5/7 TI - Effectiveness of an Artificial Intelligence-Assisted App for Improving Eating Behaviors: Mixed Methods Evaluation JO - J Med Internet Res SP - e46036 VL - 26 KW - artificial intelligence KW - chatbot KW - chatbots KW - weight KW - overweight KW - eating KW - food KW - weight loss KW - mHealth KW - mobile health KW - app KW - apps KW - applications KW - self-regulation KW - self-monitoring KW - anxiety KW - depression KW - consideration of future consequences KW - mental health KW - conversational agent KW - conversational agents KW - eating behavior KW - healthy eating KW - food consumption KW - obese KW - obesity KW - diet KW - dietary N2 - Background: A plethora of weight management apps are available, but many individuals, especially those living with overweight and obesity, still struggle to achieve adequate weight loss. An emerging area in weight management is the support for one?s self-regulation over momentary eating impulses. Objective: This study aims to examine the feasibility and effectiveness of a novel artificial intelligence?assisted weight management app in improving eating behaviors in a Southeast Asian cohort. Methods: A single-group pretest-posttest study was conducted. Participants completed the 1-week run-in period of a 12-week app-based weight management program called the Eating Trigger-Response Inhibition Program (eTRIP). This self-monitoring system was built upon 3 main components, namely, (1) chatbot-based check-ins on eating lapse triggers, (2) food-based computer vision image recognition (system built based on local food items), and (3) automated time-based nudges and meal stopwatch. At every mealtime, participants were prompted to take a picture of their food items, which were identified by a computer vision image recognition technology, thereby triggering a set of chatbot-initiated questions on eating triggers such as who the users were eating with. Paired 2-sided t tests were used to compare the differences in the psychobehavioral constructs before and after the 7-day program, including overeating habits, snacking habits, consideration of future consequences, self-regulation of eating behaviors, anxiety, depression, and physical activity. Qualitative feedback were analyzed by content analysis according to 4 steps, namely, decontextualization, recontextualization, categorization, and compilation. Results: The mean age, self-reported BMI, and waist circumference of the participants were 31.25 (SD 9.98) years, 28.86 (SD 7.02) kg/m2, and 92.60 (SD 18.24) cm, respectively. There were significant improvements in all the 7 psychobehavioral constructs, except for anxiety. After adjusting for multiple comparisons, statistically significant improvements were found for overeating habits (mean ?0.32, SD 1.16; P<.001), snacking habits (mean ?0.22, SD 1.12; P<.002), self-regulation of eating behavior (mean 0.08, SD 0.49; P=.007), depression (mean ?0.12, SD 0.74; P=.007), and physical activity (mean 1288.60, SD 3055.20 metabolic equivalent task-min/day; P<.001). Forty-one participants reported skipping at least 1 meal (ie, breakfast, lunch, or dinner), summing to 578 (67.1%) of the 862 meals skipped. Of the 230 participants, 80 (34.8%) provided textual feedback that indicated satisfactory user experience with eTRIP. Four themes emerged, namely, (1) becoming more mindful of self-monitoring, (2) personalized reminders with prompts and chatbot, (3) food logging with image recognition, and (4) engaging with a simple, easy, and appealing user interface. The attrition rate was 8.4% (21/251). Conclusions: eTRIP is a feasible and effective weight management program to be tested in a larger population for its effectiveness and sustainability as a personalized weight management program for people with overweight and obesity. Trial Registration: ClinicalTrials.gov NCT04833803; https://classic.clinicaltrials.gov/ct2/show/NCT04833803 UR - https://www.jmir.org/2024/1/e46036 UR - http://dx.doi.org/10.2196/46036 UR - http://www.ncbi.nlm.nih.gov/pubmed/38713909 ID - info:doi/10.2196/46036 ER - TY - JOUR AU - Galea, T. Jerome AU - Vasquez, H. Diego AU - Rupani, Neil AU - Gordon, B. Moya AU - Tapia, Milagros AU - Greene, Y. Karah AU - Kolevic, Lenka AU - Franke, F. Molly AU - Contreras, Carmen PY - 2024/5/7 TI - Development and Pilot-Testing of an Optimized Conversational Agent or ?Chatbot? for Peruvian Adolescents Living With HIV to Facilitate Mental Health Screening, Education, Self-Help, and Linkage to Care: Protocol for a Mixed Methods, Community-Engaged Study JO - JMIR Res Protoc SP - e55559 VL - 13 KW - chatbot KW - digital assistant KW - depression KW - HIV KW - adolescents N2 - Background: Adolescents living with HIV are disproportionally affected by depression, which worsens antiretroviral therapy adherence, increases viral load, and doubles the risk of mortality. Because most adolescents living with HIV live in low- and middle-income countries, few receive depression treatment due to a lack of mental health services and specialists in low-resource settings. Chatbot technology, used increasingly in health service delivery, is a promising approach for delivering low-intensity depression care to adolescents living with HIV in resource-constrained settings. Objective: The goal of this study is to develop and pilot-test for the feasibility and acceptability of a prototype, optimized conversational agent (chatbot) to provide mental health education, self-help skills, and care linkage for adolescents living with HIV. Methods: Chatbot development comprises 3 phases conducted over 2 years. In the first phase (year 1), formative research will be conducted to understand the views, opinions, and preferences of up to 48 youths aged 10-19 years (6 focus groups of up to 8 adolescents living with HIV per group), their caregivers (5 in-depth interviews), and HIV program personnel (5 in-depth interviews) regarding depression among adolescents living with HIV. We will also investigate the perceived acceptability of a mental health chatbot, including barriers and facilitators to accessing and using a chatbot for depression care by adolescents living with HIV. In the second phase (year 1), we will iteratively program a chatbot using the SmartBot360 software with successive versions (0.1, 0.2, and 0.3), meeting regularly with a Youth Advisory Board comprised of adolescents living with HIV who will guide and inform the chatbot development and content to arrive at a prototype version (version 1.0) for pilot-testing. In the third phase (year 2), we will pilot-test the prototype chatbot among 50 adolescents living with HIV naïve to its development. Participants will interact with the chatbot for up to 2 weeks, and data will be collected on the acceptability of the chatbot-delivered depression education and self-help strategies, depression knowledge changes, and intention to seek care linkage. Results: The study was awarded in April 2022, received institutional review board approval in November 2022, received funding in December 2022, and commenced recruitment in March 2023. By the completion of study phases 1 and 2, we expect our chatbot to incorporate key needs and preferences gathered from focus groups and interviews to develop the chatbot. By the completion of study phase 3, we will have assessed the feasibility and acceptability of the prototype chatbot. Study phase 3 began in April 2024. Final results are expected by January 2025 and published thereafter. Conclusions: The study will produce a prototype mental health chatbot developed with and for adolescents living with HIV that will be ready for efficacy testing in a subsequent, larger study. International Registered Report Identifier (IRRID): DERR1-10.2196/55559 UR - https://www.researchprotocols.org/2024/1/e55559 UR - http://dx.doi.org/10.2196/55559 UR - http://www.ncbi.nlm.nih.gov/pubmed/38713501 ID - info:doi/10.2196/55559 ER - TY - JOUR AU - Aguirre, Alyssa AU - Hilsabeck, Robin AU - Smith, Tawny AU - Xie, Bo AU - He, Daqing AU - Wang, Zhendong AU - Zou, Ning PY - 2024/5/6 TI - Assessing the Quality of ChatGPT Responses to Dementia Caregivers? Questions: Qualitative Analysis JO - JMIR Aging SP - e53019 VL - 7 KW - Alzheimer?s disease KW - information technology KW - social media KW - neurology KW - dementia KW - Alzheimer disease KW - caregiver KW - ChatGPT N2 - Background: Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves. Objectives: This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education. Methods: Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers? desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276). Results: ChatGPT?s response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%). Conclusions: ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information. UR - https://aging.jmir.org/2024/1/e53019 UR - http://dx.doi.org/10.2196/53019 ID - info:doi/10.2196/53019 ER - TY - JOUR AU - Zhu, Lingxuan AU - Mou, Weiming AU - Hong, Chenglin AU - Yang, Tao AU - Lai, Yancheng AU - Qi, Chang AU - Lin, Anqi AU - Zhang, Jian AU - Luo, Peng PY - 2024/5/6 TI - The Evaluation of Generative AI Should Include Repetition to Assess Stability JO - JMIR Mhealth Uhealth SP - e57978 VL - 12 KW - large language model KW - generative AI KW - ChatGPT KW - artificial intelligence KW - health care UR - https://mhealth.jmir.org/2024/1/e57978 UR - http://dx.doi.org/10.2196/57978 UR - http://www.ncbi.nlm.nih.gov/pubmed/38688841 ID - info:doi/10.2196/57978 ER - TY - JOUR AU - Ruksakulpiwat, Suebsarn AU - Phianhasin, Lalipat AU - Benjasirisan, Chitchanok AU - Ding, Kedong AU - Ajibade, Anuoluwapo AU - Kumar, Ayanesh AU - Stewart, Cassie PY - 2024/5/6 TI - Assessing the Efficacy of ChatGPT Versus Human Researchers in Identifying Relevant Studies on mHealth Interventions for Improving Medication Adherence in Patients With Ischemic Stroke When Conducting Systematic Reviews: Comparative Analysis JO - JMIR Mhealth Uhealth SP - e51526 VL - 12 KW - ChatGPT KW - systematic reviews KW - medication adherence KW - mobile health KW - mHealth KW - ischemic stroke KW - mobile phone N2 - Background: ChatGPT by OpenAI emerged as a potential tool for researchers, aiding in various aspects of research. One such application was the identification of relevant studies in systematic reviews. However, a comprehensive comparison of the efficacy of relevant study identification between human researchers and ChatGPT has not been conducted. Objective: This study aims to compare the efficacy of ChatGPT and human researchers in identifying relevant studies on medication adherence improvement using mobile health interventions in patients with ischemic stroke during systematic reviews. Methods: This study used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Four electronic databases, including CINAHL Plus with Full Text, Web of Science, PubMed, and MEDLINE, were searched to identify articles published from inception until 2023 using search terms based on MeSH (Medical Subject Headings) terms generated by human researchers versus ChatGPT. The authors independently screened the titles, abstracts, and full text of the studies identified through separate searches conducted by human researchers and ChatGPT. The comparison encompassed several aspects, including the ability to retrieve relevant studies, accuracy, efficiency, limitations, and challenges associated with each method. Results: A total of 6 articles identified through search terms generated by human researchers were included in the final analysis, of which 4 (67%) reported improvements in medication adherence after the intervention. However, 33% (2/6) of the included studies did not clearly state whether medication adherence improved after the intervention. A total of 10 studies were included based on search terms generated by ChatGPT, of which 6 (60%) overlapped with studies identified by human researchers. Regarding the impact of mobile health interventions on medication adherence, most included studies (8/10, 80%) based on search terms generated by ChatGPT reported improvements in medication adherence after the intervention. However, 20% (2/10) of the studies did not clearly state whether medication adherence improved after the intervention. The precision in accurately identifying relevant studies was higher in human researchers (0.86) than in ChatGPT (0.77). This is consistent with the percentage of relevance, where human researchers (9.8%) demonstrated a higher percentage of relevance than ChatGPT (3%). However, when considering the time required for both humans and ChatGPT to identify relevant studies, ChatGPT substantially outperformed human researchers as it took less time to identify relevant studies. Conclusions: Our comparative analysis highlighted the strengths and limitations of both approaches. Ultimately, the choice between human researchers and ChatGPT depends on the specific requirements and objectives of each review, but the collaborative synergy of both approaches holds the potential to advance evidence-based research and decision-making in the health care field. UR - https://mhealth.jmir.org/2024/1/e51526 UR - http://dx.doi.org/10.2196/51526 UR - http://www.ncbi.nlm.nih.gov/pubmed/38710069 ID - info:doi/10.2196/51526 ER - TY - JOUR AU - Ambrosio, Graca Maria Da AU - Lachman, M. Jamie AU - Zinzer, Paula AU - Gwebu, Hlengiwe AU - Vyas, Seema AU - Vallance, Inge AU - Calderon, Francisco AU - Gardner, Frances AU - Markle, Laurie AU - Stern, David AU - Facciola, Chiara AU - Schley, Anne AU - Danisa, Nompumelelo AU - Brukwe, Kanyisile AU - Melendez-Torres, GJ PY - 2024/5/3 TI - A Factorial Randomized Controlled Trial to Optimize User Engagement With a Chatbot-Led Parenting Intervention: Protocol for the ParentText Optimisation Trial JO - JMIR Res Protoc SP - e52145 VL - 13 KW - parenting intervention KW - chatbot-led public health intervention KW - engagement KW - implementation science KW - mobile phone N2 - Background: Violence against children (VAC) is a serious public health concern with long-lasting adverse effects. Evidence-based parenting programs are one effective means to prevent VAC; however, these interventions are not scalable in their typical in-person group format, especially in low- and middle-income countries where the need is greatest. While digital delivery, including via chatbots, offers a scalable and cost-effective means to scale up parenting programs within these settings, it is crucial to understand the key pillars of user engagement to ensure their effective implementation. Objective: This study aims to investigate the most effective and cost-effective combination of external components to optimize user engagement with ParentText, an open-source chatbot-led parenting intervention to prevent VAC in Mpumalanga, South Africa. Methods: This study will use a mixed methods design incorporating a 2 × 2 factorial cluster-randomized controlled trial and qualitative interviews. Parents of adolescent girls (32 clusters, 120 participants [60 parents and 60 girls aged 10 to 17 years] per cluster; N=3840 total participants) will be recruited from the Ehlanzeni and Nkangala districts of Mpumalanga. Clusters will be randomly assigned to receive 1 of the 4 engagement packages that include ParentText alone or combined with in-person sessions and a facilitated WhatsApp support group. Quantitative data collected will include pretest-posttest parent- and adolescent-reported surveys, facilitator-reported implementation data, and digitally tracked engagement data. Qualitative data will be collected from parents and facilitators through in-person or over-the-phone individual semistructured interviews and used to expand the interpretation and understanding of the quantitative findings. Results: Recruitment and data collection started in August 2023 and were finalized in November 2023. The total number of participants enrolled in the study is 1009, with 744 caregivers having completed onboarding to the chatbot-led intervention. Female participants represent 92.96% (938/1009) of the sample population, whereas male participants represent 7.03% (71/1009). The average participant age is 43 (SD 9) years. Conclusions: The ParentText Optimisation Trial is the first study to rigorously test engagement with a chatbot-led parenting intervention in a low- or middle-income country. The results of this study will inform the final selection of external delivery components to support engagement with ParentText in preparation for further evaluation in a randomized controlled trial in 2024. Trial Registration: Open Science Framework (OSF); https://doi.org/10.17605/OSF.IO/WFXNE International Registered Report Identifier (IRRID): DERR1-10.2196/52145 UR - https://www.researchprotocols.org/2024/1/e52145 UR - http://dx.doi.org/10.2196/52145 UR - http://www.ncbi.nlm.nih.gov/pubmed/38700935 ID - info:doi/10.2196/52145 ER - TY - JOUR AU - Leas, C. Eric AU - Ayers, W. John AU - Desai, Nimit AU - Dredze, Mark AU - Hogarth, Michael AU - Smith, M. Davey PY - 2024/5/2 TI - Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection JO - J Med Internet Res SP - e52499 VL - 26 KW - adverse events KW - artificial intelligence KW - AI KW - text analysis KW - annotation KW - ChatGPT KW - LLM KW - large language model KW - cannabis KW - delta-8-THC KW - delta-8-tetrahydrocannabiol UR - https://www.jmir.org/2024/1/e52499 UR - http://dx.doi.org/10.2196/52499 UR - http://www.ncbi.nlm.nih.gov/pubmed/38696245 ID - info:doi/10.2196/52499 ER - TY - JOUR AU - Busch, Felix AU - Han, Tianyu AU - Makowski, R. Marcus AU - Truhn, Daniel AU - Bressem, K. Keno AU - Adams, Lisa PY - 2024/5/1 TI - Integrating Text and Image Analysis: Exploring GPT-4V?s Capabilities in Advanced Radiological Applications Across Subspecialties JO - J Med Internet Res SP - e54948 VL - 26 KW - GPT-4 KW - ChatGPT KW - Generative Pre-Trained Transformer KW - multimodal large language models KW - artificial intelligence KW - AI applications in medicine KW - diagnostic radiology KW - clinical decision support systems KW - generative AI KW - medical image analysis UR - https://www.jmir.org/2024/1/e54948 UR - http://dx.doi.org/10.2196/54948 UR - http://www.ncbi.nlm.nih.gov/pubmed/38691404 ID - info:doi/10.2196/54948 ER - TY - JOUR AU - He, Wenjie AU - Zhang, Wenyan AU - Jin, Ya AU - Zhou, Qiang AU - Zhang, Huadan AU - Xia, Qing PY - 2024/4/30 TI - Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis JO - J Med Internet Res SP - e54706 VL - 26 KW - artificial intelligence KW - chatbot KW - ChatGPT KW - ERNIE Bot KW - autism N2 - Background: There is a dearth of feasibility assessments regarding using large language models (LLMs) for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant research focus on applying these models in the medical field has been on English-speaking populations. Objective: This study aims to assess the effectiveness of LLM chatbots, specifically ChatGPT-4 (OpenAI) and ERNIE Bot (version 2.2.3; Baidu, Inc), one of the most advanced LLMs in China, in addressing inquiries from autistic individuals in a Chinese setting. Methods: For this study, we gathered data from DXY?a widely acknowledged, web-based, medical consultation platform in China with a user base of over 100 million individuals. A total of 100 patient consultation samples were rigorously selected from January 2018 to August 2023, amounting to 239 questions extracted from publicly available autism-related documents on the platform. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team of 3 chief physicians assessed the responses across 4 dimensions: relevance, accuracy, usefulness, and empathy. The team completed 717 evaluations. The team initially identified the best response and then used a Likert scale with 5 response categories to gauge the responses, each representing a distinct level of quality. Finally, we compared the responses collected from different sources. Results: Among the 717 evaluations conducted, 46.86% (95% CI 43.21%-50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI 31.38%-38.36%) of assessors favoring ChatGPT and 18.27% (95% CI 15.44%-21.10%) of assessors favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI 3.69-3.82), 3.69 (95% CI 3.63-3.74), and 3.41 (95% CI 3.35-3.46), respectively. Physicians (3.66, 95% CI 3.60-3.73) and ChatGPT (3.73, 95% CI 3.69-3.77) demonstrated higher accuracy ratings compared to ERNIE Bot (3.52, 95% CI 3.47-3.57). In terms of usefulness scores, physicians (3.54, 95% CI 3.47-3.62) received higher ratings than ChatGPT (3.40, 95% CI 3.34-3.47) and ERNIE Bot (3.05, 95% CI 2.99-3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI 3.57-3.71) outperformed physicians (3.13, 95% CI 3.04-3.21) and ERNIE Bot (3.11, 95% CI 3.04-3.18). Conclusions: In this cross-sectional study, physicians? responses exhibited superiority in the present Chinese-language context. Nonetheless, LLMs can provide valuable medical guidance to autistic patients and may even surpass physicians in demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized. Trial Registration: Chinese Clinical Trial Registry ChiCTR2300074655; https://www.chictr.org.cn/bin/project/edit?pid=199432 UR - https://www.jmir.org/2024/1/e54706 UR - http://dx.doi.org/10.2196/54706 UR - http://www.ncbi.nlm.nih.gov/pubmed/38687566 ID - info:doi/10.2196/54706 ER - TY - JOUR AU - Rojas, Marcos AU - Rojas, Marcelo AU - Burgess, Valentina AU - Toro-Pérez, Javier AU - Salehi, Shima PY - 2024/4/29 TI - Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study JO - JMIR Med Educ SP - e55048 VL - 10 KW - artificial intelligence KW - AI KW - generative artificial intelligence KW - medical education KW - ChatGPT KW - EUNACOM KW - medical licensure KW - medical license KW - medical licensing exam N2 - Background: The deployment of OpenAI?s ChatGPT-3.5 and its subsequent versions, ChatGPT-4 and ChatGPT-4 With Vision (4V; also known as ?GPT-4 Turbo With Vision?), has notably influenced the medical field. Having demonstrated remarkable performance in medical examinations globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile?s medical licensing examinations?a critical step for medical practitioners in Chile?is less explored. This gap highlights the need to evaluate ChatGPT?s adaptability to diverse linguistic and cultural contexts. Objective: This study aims to evaluate the performance of ChatGPT versions 3.5, 4, and 4V in the EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile. Methods: Three official practice drills (540 questions) from the University of Chile, mirroring the EUNACOM?s structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The 3 ChatGPT versions were provided 3 attempts for each drill. Responses to questions during each attempt were systematically categorized and analyzed to assess their accuracy rate. Results: All versions of ChatGPT passed the EUNACOM drills. Specifically, versions 4 and 4V outperformed version 3.5, achieving average accuracy rates of 79.32% and 78.83%, respectively, compared to 57.53% for version 3.5 (P<.001). Version 4V, however, did not outperform version 4 (P=.73), despite the additional visual capabilities. We also evaluated ChatGPT?s performance in different medical areas of the EUNACOM and found that versions 4 and 4V consistently outperformed version 3.5. Across the different medical areas, version 3.5 displayed the highest accuracy in psychiatry (69.84%), while versions 4 and 4V achieved the highest accuracy in surgery (90.00% and 86.11%, respectively). Versions 3.5 and 4 had the lowest performance in internal medicine (52.74% and 75.62%, respectively), while version 4V had the lowest performance in public health (74.07%). Conclusions: This study reveals ChatGPT?s ability to pass the EUNACOM, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) have not significantly led to enhancements in performance on image-based questions. The variations in proficiency across medical fields suggest the need for more nuanced AI training. Additionally, the study underscores the importance of exploring innovative approaches to using AI to augment human cognition and enhance the learning process. Such advancements have the potential to significantly influence medical education, fostering not only knowledge acquisition but also the development of critical thinking and problem-solving skills among health care professionals. UR - https://mededu.jmir.org/2024/1/e55048 UR - http://dx.doi.org/10.2196/55048 ID - info:doi/10.2196/55048 ER - TY - JOUR AU - Thunström, Osmanovic Almira AU - Carlsen, Krage Hanne AU - Ali, Lilas AU - Larson, Tomas AU - Hellström, Andreas AU - Steingrimsson, Steinn PY - 2024/4/29 TI - Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial JO - JMIR Hum Factors SP - e54581 VL - 11 KW - chatbot KW - chatbots KW - chat-bot KW - chat-bots KW - text-only chatbot, voice-only chatbot KW - mental health KW - mental illness KW - mental disease KW - mental diseases KW - mental illnesses KW - mental health service KW - mental health services KW - interface KW - system usability KW - usability KW - digital health KW - machine learning KW - ML KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - NLP KW - natural language processing N2 - Background: The use of chatbots in mental health support has increased exponentially in recent years, with studies showing that they may be effective in treating mental health problems. More recently, the use of visual avatars called digital humans has been introduced. Digital humans have the capability to use facial expressions as another dimension in human-computer interactions. It is important to study the difference in emotional response and usability preferences between text-based chatbots and digital humans for interacting with mental health services. Objective: This study aims to explore to what extent a digital human interface and a text-only chatbot interface differed in usability when tested by healthy participants, using BETSY (Behavior, Emotion, Therapy System, and You) which uses 2 distinct interfaces: a digital human with anthropomorphic features and a text-only user interface. We also set out to explore how chatbot-generated conversations on mental health (specific to each interface) affected self-reported feelings and biometrics. Methods: We explored to what extent a digital human with anthropomorphic features differed from a traditional text-only chatbot regarding perception of usability through the System Usability Scale, emotional reactions through electroencephalography, and feelings of closeness. Healthy participants (n=45) were randomized to 2 groups that used a digital human with anthropomorphic features (n=25) or a text-only chatbot with no such features (n=20). The groups were compared by linear regression analysis and t tests. Results: No differences were observed between the text-only and digital human groups regarding demographic features. The mean System Usability Scale score was 75.34 (SD 10.01; range 57-90) for the text-only chatbot versus 64.80 (SD 14.14; range 40-90) for the digital human interface. Both groups scored their respective chatbot interfaces as average or above average in usability. Women were more likely to report feeling annoyed by BETSY. Conclusions: The text-only chatbot was perceived as significantly more user-friendly than the digital human, although there were no significant differences in electroencephalography measurements. Male participants exhibited lower levels of annoyance with both interfaces, contrary to previously reported findings. UR - https://humanfactors.jmir.org/2024/1/e54581 UR - http://dx.doi.org/10.2196/54581 UR - http://www.ncbi.nlm.nih.gov/pubmed/38683664 ID - info:doi/10.2196/54581 ER - TY - JOUR AU - Wang, Shangqiguo AU - Mo, Changgeng AU - Chen, Yuan AU - Dai, Xiaolu AU - Wang, Huiyi AU - Shen, Xiaoli PY - 2024/4/26 TI - Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care JO - JMIR Med Educ SP - e55595 VL - 10 KW - ChatGPT KW - medical education KW - artificial intelligence KW - AI KW - audiology KW - hearing care KW - natural language processing KW - large language model KW - Taiwan KW - hearing KW - hearing specialist KW - audiologist KW - examination KW - information accuracy KW - educational technology KW - healthcare services KW - chatbot KW - health care services N2 - Background: Artificial intelligence (AI) chatbots, such as ChatGPT-4, have shown immense potential for application across various aspects of medicine, including medical education, clinical practice, and research. Objective: This study aimed to evaluate the performance of ChatGPT-4 in the 2023 Taiwan Audiologist Qualification Examination, thereby preliminarily exploring the potential utility of AI chatbots in the fields of audiology and hearing care services. Methods: ChatGPT-4 was tasked to provide answers and reasoning for the 2023 Taiwan Audiologist Qualification Examination. The examination encompassed six subjects: (1) basic auditory science, (2) behavioral audiology, (3) electrophysiological audiology, (4) principles and practice of hearing devices, (5) health and rehabilitation of the auditory and balance systems, and (6) auditory and speech communication disorders (including professional ethics). Each subject included 50 multiple-choice questions, with the exception of behavioral audiology, which had 49 questions, amounting to a total of 299 questions. Results: The correct answer rates across the 6 subjects were as follows: 88% for basic auditory science, 63% for behavioral audiology, 58% for electrophysiological audiology, 72% for principles and practice of hearing devices, 80% for health and rehabilitation of the auditory and balance systems, and 86% for auditory and speech communication disorders (including professional ethics). The overall accuracy rate for the 299 questions was 75%, which surpasses the examination?s passing criteria of an average 60% accuracy rate across all subjects. A comprehensive review of ChatGPT-4?s responses indicated that incorrect answers were predominantly due to information errors. Conclusions: ChatGPT-4 demonstrated a robust performance in the Taiwan Audiologist Qualification Examination, showcasing effective logical reasoning skills. Our results suggest that with enhanced information accuracy, ChatGPT-4?s performance could be further improved. This study indicates significant potential for the application of AI chatbots in audiology and hearing care services. UR - https://mededu.jmir.org/2024/1/e55595 UR - http://dx.doi.org/10.2196/55595 ID - info:doi/10.2196/55595 ER - TY - JOUR AU - Lv, Xiaolei AU - Zhang, Xiaomeng AU - Li, Yuan AU - Ding, Xinxin AU - Lai, Hongchang AU - Shi, Junyu PY - 2024/4/25 TI - Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content JO - J Med Internet Res SP - e55847 VL - 26 KW - large language model KW - artificial intelligence KW - public oral health KW - health care access KW - patient education N2 - Background: While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation. Objective: This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts. Methods: This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions. Results: Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses. Conclusions: LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings. UR - https://www.jmir.org/2024/1/e55847 UR - http://dx.doi.org/10.2196/55847 UR - http://www.ncbi.nlm.nih.gov/pubmed/38663010 ID - info:doi/10.2196/55847 ER - TY - JOUR AU - Choudhury, Avishek AU - Chaudhry, Zaira PY - 2024/4/25 TI - Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals JO - J Med Internet Res SP - e56764 VL - 26 KW - trust KW - ChatGPT KW - human factors KW - healthcare KW - LLMs KW - large language models KW - LLM user trust KW - AI accountability KW - artificial intelligence KW - AI technology KW - technologies KW - effectiveness KW - policy KW - medical student KW - medical students KW - risk factor KW - quality of care KW - healthcare professional KW - healthcare professionals KW - human element UR - https://www.jmir.org/2024/1/e56764 UR - http://dx.doi.org/10.2196/56764 UR - http://www.ncbi.nlm.nih.gov/pubmed/38662419 ID - info:doi/10.2196/56764 ER - TY - JOUR AU - Kernberg, Annessa AU - Gold, A. Jeffrey AU - Mohan, Vishnu PY - 2024/4/22 TI - Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study JO - J Med Internet Res SP - e54419 VL - 26 KW - generative AI KW - generative artificial intelligence KW - ChatGPT KW - simulation KW - large language model KW - clinical documentation KW - quality KW - accuracy KW - reproducibility KW - publicly available KW - medical note KW - medical notes KW - generation KW - medical documentation KW - documentation KW - documentations KW - AI KW - artificial intelligence KW - transcript KW - transcripts KW - ChatGPT-4 N2 - Background: Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)?powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. Objective: This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model?s performance across different categories. Methods: We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. Results: Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the ?Objective? section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). Conclusions: Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model?s effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time. UR - https://www.jmir.org/2024/1/e54419 UR - http://dx.doi.org/10.2196/54419 UR - http://www.ncbi.nlm.nih.gov/pubmed/38648636 ID - info:doi/10.2196/54419 ER - TY - JOUR AU - Pham, Cecilia AU - Govender, Romi AU - Tehami, Salik AU - Chavez, Summer AU - Adepoju, E. Omolola AU - Liaw, Winston PY - 2024/4/22 TI - ChatGPT?s Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study JO - J Med Internet Res SP - e55037 VL - 26 KW - ChatGPT KW - artificial intelligence KW - AI KW - large language model KW - LLM KW - cardiac arrest KW - bradycardia KW - simulation KW - advanced cardiovascular life support KW - ACLS KW - bradycardia simulations KW - America KW - American KW - heart association KW - cardiac KW - life support KW - exploratory study KW - heart KW - heart attack KW - clinical decision support KW - diagnostics KW - algorithms N2 - Background: ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT?s performance found that artificial intelligence could pass the American Heart Association?s advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT?s accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. Objective: This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. Methods: We evaluated the accuracy of ChatGPT?s responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. Results: ChatGPT?s median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT?s median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT?s outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. Conclusions: This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice. UR - https://www.jmir.org/2024/1/e55037 UR - http://dx.doi.org/10.2196/55037 UR - http://www.ncbi.nlm.nih.gov/pubmed/38648098 ID - info:doi/10.2196/55037 ER - TY - JOUR AU - Mishra, Vishala AU - Sarraju, Ashish AU - Kalwani, M. Neil AU - Dexter, P. Joseph PY - 2024/4/22 TI - Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study JO - J Med Internet Res SP - e55388 VL - 26 KW - artificial intelligence KW - ChatGPT KW - GPT KW - digital health KW - large language model KW - NLP KW - language model KW - language models KW - prompt engineering KW - health communication KW - generative KW - health literacy KW - natural language processing KW - patient-physician communication KW - prevention KW - cardiology KW - cardiovascular KW - heart KW - education KW - educational KW - human-in-the-loop KW - machine learning UR - https://www.jmir.org/2024/1/e55388 UR - http://dx.doi.org/10.2196/55388 UR - http://www.ncbi.nlm.nih.gov/pubmed/38648104 ID - info:doi/10.2196/55388 ER - TY - JOUR AU - King, C. Ryan AU - Samaan, S. Jamil AU - Yeo, Hui Yee AU - Peng, Yuxin AU - Kunkel, C. David AU - Habib, A. Ali AU - Ghashghaei, Roxana PY - 2024/4/19 TI - A Multidisciplinary Assessment of ChatGPT?s Knowledge of Amyloidosis: Observational Study JO - JMIR Cardio SP - e53421 VL - 8 KW - amyloidosis KW - ChatGPT KW - large language models KW - cardiology KW - gastroenterology KW - neurology KW - artificial intelligence KW - multidisciplinary care KW - assessment KW - patient education KW - large language model KW - accuracy KW - reliability KW - accessibility KW - educational resources KW - dissemination KW - gastroenterologist KW - cardiologist KW - medical society KW - institution KW - institutions KW - Facebook KW - neurologist KW - reproducibility KW - amyloidosis-related N2 - Background: Amyloidosis, a rare multisystem condition, often requires complex, multidisciplinary care. Its low prevalence underscores the importance of efforts to ensure the availability of high-quality patient education materials for better outcomes. ChatGPT (OpenAI) is a large language model powered by artificial intelligence that offers a potential avenue for disseminating accurate, reliable, and accessible educational resources for both patients and providers. Its user-friendly interface, engaging conversational responses, and the capability for users to ask follow-up questions make it a promising future tool in delivering accurate and tailored information to patients. Objective: We performed a multidisciplinary assessment of the accuracy, reproducibility, and readability of ChatGPT in answering questions related to amyloidosis. Methods: In total, 98 amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions, and amyloidosis Facebook support groups and inputted into ChatGPT-3.5 and ChatGPT-4. Cardiology- and gastroenterology-related responses were independently graded by a board-certified cardiologist and gastroenterologist, respectively, who specialize in amyloidosis. These 2 reviewers (RG and DCK) also graded general questions for which disagreements were resolved with discussion. Neurology-related responses were graded by a board-certified neurologist (AAH) who specializes in amyloidosis. Reviewers used the following grading scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model. The readability of ChatGPT-4 responses was also evaluated using the Textstat library in Python (Python Software Foundation) and the Textstat readability package in R software (R Foundation for Statistical Computing). Results: ChatGPT-4 (n=98) provided 93 (95%) responses with accurate information, and 82 (84%) were comprehensive. ChatGPT-3.5 (n=83) provided 74 (89%) responses with accurate information, and 66 (79%) were comprehensive. When examined by question category, ChatGTP-4 and ChatGPT-3.5 provided 53 (95%) and 48 (86%) comprehensive responses, respectively, to ?general questions? (n=56). When examined by subject, ChatGPT-4 and ChatGPT-3.5 performed best in response to cardiology questions (n=12) with both models producing 10 (83%) comprehensive responses. For gastroenterology (n=15), ChatGPT-4 received comprehensive grades for 9 (60%) responses, and ChatGPT-3.5 provided 8 (53%) responses. Overall, 96 of 98 (98%) responses for ChatGPT-4 and 73 of 83 (88%) for ChatGPT-3.5 were reproducible. The readability of ChatGPT-4?s responses ranged from 10th to beyond graduate US grade levels with an average of 15.5 (SD 1.9). Conclusions: Large language models are a promising tool for accurate and reliable health information for patients living with amyloidosis. However, ChatGPT?s responses exceeded the American Medical Association?s recommended fifth- to sixth-grade reading level. Future studies focusing on improving response accuracy and readability are warranted. Prior to widespread implementation, the technology?s limitations and ethical implications must be further explored to ensure patient safety and equitable implementation. UR - https://cardio.jmir.org/2024/1/e53421 UR - http://dx.doi.org/10.2196/53421 UR - http://www.ncbi.nlm.nih.gov/pubmed/38640472 ID - info:doi/10.2196/53421 ER - TY - JOUR AU - He, Zhe AU - Bhasuran, Balu AU - Jin, Qiao AU - Tian, Shubo AU - Hanna, Karim AU - Shavor, Cindy AU - Arguello, Garcia Lisbeth AU - Murray, Patrick AU - Lu, Zhiyong PY - 2024/4/17 TI - Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study JO - J Med Internet Res SP - e56655 VL - 26 KW - large language models KW - generative artificial intelligence KW - generative AI KW - ChatGPT KW - laboratory test results KW - patient education KW - natural language processing N2 - Background: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. Objective: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test?related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. Methods: We collected laboratory test result?related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. Results: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4?generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4?s responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one?s medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients? laboratory test result?related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4?s responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation. UR - https://www.jmir.org/2024/1/e56655 UR - http://dx.doi.org/10.2196/56655 UR - http://www.ncbi.nlm.nih.gov/pubmed/38630520 ID - info:doi/10.2196/56655 ER - TY - JOUR AU - Herrmann-Werner, Anne AU - Festl-Wietek, Teresa AU - Holderried, Friederike AU - Herschbach, Lea AU - Griewatz, Jan AU - Masters, Ken AU - Zipfel, Stephan AU - Mahling, Moritz PY - 2024/4/16 TI - Authors? Reply: ?Evaluating GPT-4?s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications? JO - J Med Internet Res SP - e57778 VL - 26 KW - answer KW - artificial intelligence KW - assessment KW - Bloom?s taxonomy KW - ChatGPT KW - classification KW - error KW - exam KW - examination KW - generative KW - GPT-4 KW - Generative Pre-trained Transformer 4 KW - language model KW - learning outcome KW - LLM KW - MCQ KW - medical education KW - medical exam KW - multiple-choice question KW - natural language processing KW - NLP KW - psychosomatic KW - question KW - response KW - taxonomy UR - https://www.jmir.org/2024/1/e57778 UR - http://dx.doi.org/10.2196/57778 UR - http://www.ncbi.nlm.nih.gov/pubmed/38625723 ID - info:doi/10.2196/57778 ER - TY - JOUR AU - Huang, Kuan-Ju PY - 2024/4/16 TI - Evaluating GPT-4?s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications JO - J Med Internet Res SP - e56997 VL - 26 KW - artificial intelligence KW - ChatGPT KW - Bloom taxonomy KW - AI KW - cognition UR - https://www.jmir.org/2024/1/e56997 UR - http://dx.doi.org/10.2196/56997 UR - http://www.ncbi.nlm.nih.gov/pubmed/38625725 ID - info:doi/10.2196/56997 ER - TY - JOUR AU - Bragazzi, Luigi Nicola AU - Garbarino, Sergio PY - 2024/4/16 TI - Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis JO - JMIR Form Res SP - e55762 VL - 8 KW - sleep KW - sleep health KW - sleep-related disbeliefs KW - generative conversational artificial intelligence KW - chatbot KW - ChatGPT KW - misinformation KW - artificial intelligence KW - comparative study KW - expert analysis KW - adequate sleep KW - well-being KW - sleep trackers KW - sleep health education KW - sleep-related KW - chronic disease KW - healthcare cost KW - sleep timing KW - sleep duration KW - presleep behaviors KW - sleep experts KW - healthy behavior KW - public health KW - conversational agents N2 - Background: Adequate sleep is essential for maintaining individual and public health, positively affecting cognition and well-being, and reducing chronic disease risks. It plays a significant role in driving the economy, public safety, and managing health care costs. Digital tools, including websites, sleep trackers, and apps, are key in promoting sleep health education. Conversational artificial intelligence (AI) such as ChatGPT (OpenAI, Microsoft Corp) offers accessible, personalized advice on sleep health but raises concerns about potential misinformation. This underscores the importance of ensuring that AI-driven sleep health information is accurate, given its significant impact on individual and public health, and the spread of sleep-related myths. Objective: This study aims to examine ChatGPT?s capability to debunk sleep-related disbeliefs. Methods: A mixed methods design was leveraged. ChatGPT categorized 20 sleep-related myths identified by 10 sleep experts and rated them in terms of falseness and public health significance, on a 5-point Likert scale. Sensitivity, positive predictive value, and interrater agreement were also calculated. A qualitative comparative analysis was also conducted. Results: ChatGPT labeled a significant portion (n=17, 85%) of the statements as ?false? (n=9, 45%) or ?generally false? (n=8, 40%), with varying accuracy across different domains. For instance, it correctly identified most myths about ?sleep timing,? ?sleep duration,? and ?behaviors during sleep,? while it had varying degrees of success with other categories such as ?pre-sleep behaviors? and ?brain function and sleep.? ChatGPT?s assessment of the degree of falseness and public health significance, on the 5-point Likert scale, revealed an average score of 3.45 (SD 0.87) and 3.15 (SD 0.99), respectively, indicating a good level of accuracy in identifying the falseness of statements and a good understanding of their impact on public health. The AI-based tool showed a sensitivity of 85% and a positive predictive value of 100%. Overall, this indicates that when ChatGPT labels a statement as false, it is highly reliable, but it may miss identifying some false statements. When comparing with expert ratings, high intraclass correlation coefficients (ICCs) between ChatGPT?s appraisals and expert opinions could be found, suggesting that the AI?s ratings were generally aligned with expert views on falseness (ICC=.83, P<.001) and public health significance (ICC=.79, P=.001) of sleep-related myths. Qualitatively, both ChatGPT and sleep experts refuted sleep-related misconceptions. However, ChatGPT adopted a more accessible style and provided a more generalized view, focusing on broad concepts, while experts sometimes used technical jargon, providing evidence-based explanations. Conclusions: ChatGPT-4 can accurately address sleep-related queries and debunk sleep-related myths, with a performance comparable to sleep experts, even if, given its limitations, the AI cannot completely replace expert opinions, especially in nuanced and complex fields such as sleep health, but can be a valuable complement in the dissemination of updated information and promotion of healthy behaviors. UR - https://formative.jmir.org/2024/1/e55762 UR - http://dx.doi.org/10.2196/55762 UR - http://www.ncbi.nlm.nih.gov/pubmed/38501898 ID - info:doi/10.2196/55762 ER - TY - JOUR AU - Dsouza, Maria Jeanne PY - 2024/4/15 TI - A Student?s Viewpoint on ChatGPT Use and Automation Bias in Medical Education JO - JMIR Med Educ SP - e57696 VL - 10 KW - AI KW - artificial intelligence KW - ChatGPT KW - medical education UR - https://mededu.jmir.org/2024/1/e57696 UR - http://dx.doi.org/10.2196/57696 ID - info:doi/10.2196/57696 ER - TY - JOUR AU - Kosyluk, Kristin AU - Baeder, Tanner AU - Greene, Yeona Karah AU - Tran, T. Jennifer AU - Bolton, Cassidy AU - Loecher, Nele AU - DiEva, Daniel AU - Galea, T. Jerome PY - 2024/4/12 TI - Mental Distress, Label Avoidance, and Use of a Mental Health Chatbot: Results From a US Survey JO - JMIR Form Res SP - e45959 VL - 8 KW - chatbots KW - conversational agents KW - mental health KW - resources KW - screening KW - resource referral KW - stigma KW - label avoidance KW - survey KW - training KW - behavioral KW - COVID-19 KW - pilot test KW - design KW - users KW - psychological distress KW - symptoms N2 - Background: For almost two decades, researchers and clinicians have argued that certain aspects of mental health treatment can be removed from clinicians? responsibilities and allocated to technology, preserving valuable clinician time and alleviating the burden on the behavioral health care system. The service delivery tasks that could arguably be allocated to technology without negatively impacting patient outcomes include screening, triage, and referral. Objective: We pilot-tested a chatbot for mental health screening and referral to understand the relationship between potential users? demographics and chatbot use; the completion rate of mental health screening when delivered by a chatbot; and the acceptability of a prototype chatbot designed for mental health screening and referral. This chatbot not only screened participants for psychological distress but also referred them to appropriate resources that matched their level of distress and preferences. The goal of this study was to determine whether a mental health screening and referral chatbot would be feasible and acceptable to users. Methods: We conducted an internet-based survey among a sample of US-based adults. Our survey collected demographic data along with a battery of measures assessing behavioral health and symptoms, stigma (label avoidance and perceived stigma), attitudes toward treatment-seeking, readiness for change, and technology readiness and acceptance. Participants were then offered to engage with our chatbot. Those who engaged with the chatbot completed a mental health screening, received a distress score based on this screening, were referred to resources appropriate for their current level of distress, and were asked to rate the acceptability of the chatbot. Results: We found that mental health screening using a chatbot was feasible, with 168 (75.7%) of our 222 participants completing mental health screening within the chatbot sessions. Various demographic characteristics were associated with a willingness to use the chatbot. The participants who used the chatbot found it to be acceptable. Logistic regression produced a significant model with perceived usefulness and symptoms as significant positive predictors of chatbot use for the overall sample, and label avoidance as the only significant predictor of chatbot use for those currently experiencing distress. Conclusions: Label avoidance, the desire to avoid mental health services to avoid the stigmatized label of mental illness, is a significant negative predictor of care seeking. Therefore, our finding regarding label avoidance and chatbot use has significant public health implications in terms of facilitating access to mental health resources. Those who are high on label avoidance are not likely to seek care in a community mental health clinic, yet they are likely willing to engage with a mental health chatbot, participate in mental health screening, and receive mental health resources within the chatbot session. Chatbot technology may prove to be a way to engage those in care who have previously avoided treatment due to stigma. UR - https://formative.jmir.org/2024/1/e45959 UR - http://dx.doi.org/10.2196/45959 UR - http://www.ncbi.nlm.nih.gov/pubmed/38607665 ID - info:doi/10.2196/45959 ER - TY - JOUR AU - Wu, Yijun AU - Zheng, Yue AU - Feng, Baijie AU - Yang, Yuqi AU - Kang, Kai AU - Zhao, Ailin PY - 2024/4/10 TI - Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students JO - JMIR Med Educ SP - e52483 VL - 10 KW - artificial intelligence KW - AI KW - ChatGPT KW - medical education KW - doctors KW - medical students UR - https://mededu.jmir.org/2024/1/e52483 UR - http://dx.doi.org/10.2196/52483 UR - http://www.ncbi.nlm.nih.gov/pubmed/38598263 ID - info:doi/10.2196/52483 ER - TY - JOUR AU - Hadar-Shoval, Dorit AU - Asraf, Kfir AU - Mizrachi, Yonathan AU - Haber, Yuval AU - Elyoseph, Zohar PY - 2024/4/9 TI - Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz?s Theory of Basic Values JO - JMIR Ment Health SP - e55988 VL - 11 KW - large language models KW - LLMs KW - large language model KW - LLM KW - machine learning KW - ML KW - natural language processing KW - NLP KW - deep learning KW - ChatGPT KW - Chat-GPT KW - chatbot KW - chatbots KW - chat-bot KW - chat-bots KW - Claude KW - values KW - Bard KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - mental health KW - mental illness KW - mental illnesses KW - mental disease KW - mental diseases KW - mental disorder KW - mental disorders KW - mobile health KW - mHealth KW - eHealth KW - mood disorder KW - mood disorders N2 - Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz?s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire?Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs? value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs? value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs? distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs? responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values. UR - https://mental.jmir.org/2024/1/e55988 UR - http://dx.doi.org/10.2196/55988 UR - http://www.ncbi.nlm.nih.gov/pubmed/38593424 ID - info:doi/10.2196/55988 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Harada, Yukinori AU - Tokumasu, Kazuki AU - Ito, Takahiro AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2024/4/9 TI - Evaluating ChatGPT-4?s Diagnostic Accuracy: Impact of Visual Data Integration JO - JMIR Med Inform SP - e55627 VL - 12 KW - artificial intelligence KW - large language model KW - LLM KW - LLMs KW - language model KW - language models KW - ChatGPT KW - GPT KW - ChatGPT-4V KW - ChatGPT-4 Vision KW - clinical decision support KW - natural language processing KW - decision support KW - NLP KW - diagnostic excellence KW - diagnosis KW - diagnoses KW - diagnose KW - diagnostic KW - diagnostics KW - image KW - images KW - imaging N2 - Background: In the evolving field of health care, multimodal generative artificial intelligence (AI) systems, such as ChatGPT-4 with vision (ChatGPT-4V), represent a significant advancement, as they integrate visual data with text data. This integration has the potential to revolutionize clinical diagnostics by offering more comprehensive analysis capabilities. However, the impact on diagnostic accuracy of using image data to augment ChatGPT-4 remains unclear. Objective: This study aims to assess the impact of adding image data on ChatGPT-4?s diagnostic accuracy and provide insights into how image data integration can enhance the accuracy of multimodal AI in medical diagnostics. Specifically, this study endeavored to compare the diagnostic accuracy between ChatGPT-4V, which processed both text and image data, and its counterpart, ChatGPT-4, which only uses text data. Methods: We identified a total of 557 case reports published in the American Journal of Case Reports from January 2022 to March 2023. After excluding cases that were nondiagnostic, pediatric, and lacking image data, we included 363 case descriptions with their final diagnoses and associated images. We compared the diagnostic accuracy of ChatGPT-4V and ChatGPT-4 without vision based on their ability to include the final diagnoses within differential diagnosis lists. Two independent physicians evaluated their accuracy, with a third resolving any discrepancies, ensuring a rigorous and objective analysis. Results: The integration of image data into ChatGPT-4V did not significantly enhance diagnostic accuracy, showing that final diagnoses were included in the top 10 differential diagnosis lists at a rate of 85.1% (n=309), comparable to the rate of 87.9% (n=319) for the text-only version (P=.33). Notably, ChatGPT-4V?s performance in correctly identifying the top diagnosis was inferior, at 44.4% (n=161), compared with 55.9% (n=203) for the text-only version (P=.002, ?2 test). Additionally, ChatGPT-4?s self-reports showed that image data accounted for 30% of the weight in developing the differential diagnosis lists in more than half of cases. Conclusions: Our findings reveal that currently, ChatGPT-4V predominantly relies on textual data, limiting its ability to fully use the diagnostic potential of visual information. This study underscores the need for further development of multimodal generative AI systems to effectively integrate and use clinical image data. Enhancing the diagnostic performance of such AI systems through improved multimodal data integration could significantly benefit patient care by providing more accurate and comprehensive diagnostic insights. Future research should focus on overcoming these limitations, paving the way for the practical application of advanced AI in medicine. UR - https://medinform.jmir.org/2024/1/e55627 UR - http://dx.doi.org/10.2196/55627 UR - http://www.ncbi.nlm.nih.gov/pubmed/38592758 ID - info:doi/10.2196/55627 ER - TY - JOUR AU - Sivarajkumar, Sonish AU - Kelley, Mark AU - Samolyk-Mazzanti, Alyssa AU - Visweswaran, Shyam AU - Wang, Yanshan PY - 2024/4/8 TI - An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study JO - JMIR Med Inform SP - e55318 VL - 12 KW - large language model KW - LLM KW - LLMs KW - natural language processing KW - NLP KW - in-context learning KW - prompt engineering KW - evaluation KW - zero-shot KW - few shot KW - prompting KW - GPT KW - language model KW - language KW - models KW - machine learning KW - clinical data KW - clinical information KW - extraction KW - BARD KW - Gemini KW - LLaMA-2 KW - heuristic KW - prompt KW - prompts KW - ensemble N2 - Background: Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. Objective: The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types?heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. Methods: This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. Results: The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. Conclusions: This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area. UR - https://medinform.jmir.org/2024/1/e55318 UR - http://dx.doi.org/10.2196/55318 UR - http://www.ncbi.nlm.nih.gov/pubmed/38587879 ID - info:doi/10.2196/55318 ER - TY - JOUR AU - Mugaanyi, Joseph AU - Cai, Liuying AU - Cheng, Sumei AU - Lu, Caide AU - Huang, Jing PY - 2024/4/5 TI - Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study JO - J Med Internet Res SP - e52935 VL - 26 KW - large language models KW - accuracy KW - academic writing KW - AI KW - cross-disciplinary evaluation KW - scholarly writing KW - ChatGPT KW - GPT-3.5 KW - writing tool KW - scholarly KW - academic discourse KW - LLMs KW - machine learning algorithms KW - NLP KW - natural language processing KW - citations KW - references KW - natural science KW - humanities KW - chatbot KW - artificial intelligence N2 - Background: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. Objective: The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. Methods: Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. Results: Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. Conclusions: ChatGPT?s performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy. UR - https://www.jmir.org/2024/1/e52935 UR - http://dx.doi.org/10.2196/52935 UR - http://www.ncbi.nlm.nih.gov/pubmed/38578685 ID - info:doi/10.2196/52935 ER - TY - JOUR AU - Livermore, Polly AU - Kupiec, Klaudia AU - Wedderburn, R. Lucy AU - Knight, Andrea AU - Solebo, L. Ameenat AU - Shafran, Roz AU - Robert, Glenn AU - Sebire, J. N. AU - Gibson, Faith AU - PY - 2024/4/3 TI - Designing, Developing, and Testing a Chatbot for Parents and Caregivers of Children and Young People With Rheumatological Conditions (the IMPACT Study): Protocol for a Co-Designed Proof-of-Concept Study JO - JMIR Res Protoc SP - e57238 VL - 13 KW - caregivers KW - chatbot KW - paediatric rheumatology KW - parents and caregivers KW - parents/carers KW - pediatric KW - proof-of-concept KW - quality of life KW - rheumatology N2 - Background: Pediatric rheumatology is a term that encompasses over 80 conditions affecting different organs and systems. Children and young people with rheumatological chronic conditions are known to have high levels of mental health problems and therefore are at risk of poor health outcomes. Clinical psychologists can help children and young people manage the daily difficulties of living with one of these conditions; however, there are insufficient pediatric psychologists in the United Kingdom. We urgently need to consider other ways of providing early, essential support to improve their current well-being. One way of doing this is to empower parents and caregivers to have more of the answers that their children and young people need to support them further between their hospital appointments. Objective: The objective of this co-designed proof-of-concept study is to design, develop, and test a chatbot intervention to support parents and caregivers of children and young people with rheumatological conditions. Methods: This study will explore the needs and views of children and young people with rheumatological conditions, their siblings, parents, and caregivers, as well as health care professionals working in pediatric rheumatology. We will ask approximately 100 participants in focus groups where they think the gaps are in current clinical care and what ideas they have for improving upon them. Creative experience-based co-design workshops will then decide upon top priorities to develop further while informing the appearance, functionality, and practical delivery of a chatbot intervention. Upon completion of a minimum viable product, approximately 100 parents and caregivers will user-test the chatbot intervention in an iterative sprint methodology to determine its worth as a mechanism for support for parents. Results: A total of 73 children, young people, parents, caregivers, and health care professionals have so far been enrolled in the study, which began in November 2023. The anticipated completion date of the study is April 2026. The data analysis is expected to be completed in January 2026, with the results being published in April 2026. Conclusions: This study will provide evidence on the accessibility, acceptability, and usability of a chatbot intervention for parents and caregivers of children and young people with rheumatological conditions. If proven useful, it could lead to a future efficacy trial of one of the first chatbot interventions to provide targeted and user-suggested support for parents and caregivers of children with chronic health conditions in health care services. This study is unique in that it will detail the needs and wants of children, young people, siblings, parents, and caregivers to improve the current support given to families living with pediatric rheumatological conditions. It will be conducted across the whole of the United Kingdom for all pediatric rheumatological conditions at all stages of the disease trajectory. International Registered Report Identifier (IRRID): DERR1-10.2196/57238 UR - https://www.researchprotocols.org/2024/1/e57238 UR - http://dx.doi.org/10.2196/57238 UR - http://www.ncbi.nlm.nih.gov/pubmed/38568725 ID - info:doi/10.2196/57238 ER - TY - JOUR AU - Slade, Emily AU - Rennick-Egglestone, Stefan AU - Ng, Fiona AU - Kotera, Yasuhiro AU - Llewellyn-Beardsley, Joy AU - Newby, Chris AU - Glover, Tony AU - Keppens, Jeroen AU - Slade, Mike PY - 2024/3/29 TI - The Implementation of Recommender Systems for Mental Health Recovery Narratives: Evaluation of Use and Performance JO - JMIR Ment Health SP - e45754 VL - 11 KW - recommender system KW - mean absolute error KW - precision KW - intralist diversity KW - item space coverage KW - fairness across users KW - psychosis KW - Narrative Experiences Online trial KW - NEON trial KW - lived experience narrative KW - recovery story N2 - Background: Recommender systems help narrow down a large range of items to a smaller, personalized set. NarraGive is a first-in-field hybrid recommender system for mental health recovery narratives, recommending narratives based on their content and narrator characteristics (using content-based filtering) and on narratives beneficially impacting other similar users (using collaborative filtering). NarraGive is integrated into the Narrative Experiences Online (NEON) intervention, a web application providing access to the NEON Collection of recovery narratives. Objective: This study aims to analyze the 3 recommender system algorithms used in NarraGive to inform future interventions using recommender systems for lived experience narratives. Methods: Using a recently published framework for evaluating recommender systems to structure the analysis, we compared the content-based filtering algorithm and collaborative filtering algorithms by evaluating the accuracy (how close the predicted ratings are to the true ratings), precision (the proportion of the recommended narratives that are relevant), diversity (how diverse the recommended narratives are), coverage (the proportion of all available narratives that can be recommended), and unfairness (whether the algorithms produce less accurate predictions for disadvantaged participants) across gender and ethnicity. We used data from all participants in 2 parallel-group, waitlist control clinical trials of the NEON intervention (NEON trial: N=739; NEON for other [eg, nonpsychosis] mental health problems [NEON-O] trial: N=1023). Both trials included people with self-reported mental health problems who had and had not used statutory mental health services. In addition, NEON trial participants had experienced self-reported psychosis in the previous 5 years. Our evaluation used a database of Likert-scale narrative ratings provided by trial participants in response to validated narrative feedback questions. Results: Participants from the NEON and NEON-O trials provided 2288 and 1896 narrative ratings, respectively. Each rated narrative had a median of 3 ratings and 2 ratings, respectively. For the NEON trial, the content-based filtering algorithm performed better for coverage; the collaborative filtering algorithms performed better for accuracy, diversity, and unfairness across both gender and ethnicity; and neither algorithm performed better for precision. For the NEON-O trial, the content-based filtering algorithm did not perform better on any metric; the collaborative filtering algorithms performed better on accuracy and unfairness across both gender and ethnicity; and neither algorithm performed better for precision, diversity, or coverage. Conclusions: Clinical population may be associated with recommender system performance. Recommender systems are susceptible to a wide range of undesirable biases. Approaches to mitigating these include providing enough initial data for the recommender system (to prevent overfitting), ensuring that items can be accessed outside the recommender system (to prevent a feedback loop between accessed items and recommended items), and encouraging participants to provide feedback on every narrative they interact with (to prevent participants from only providing feedback when they have strong opinions). UR - https://mental.jmir.org/2024/1/e45754 UR - http://dx.doi.org/10.2196/45754 UR - http://www.ncbi.nlm.nih.gov/pubmed/38551630 ID - info:doi/10.2196/45754 ER - TY - JOUR AU - Wang, Lei AU - Ma, Yinyao AU - Bi, Wenshuai AU - Lv, Hanlin AU - Li, Yuxiang PY - 2024/3/29 TI - An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study JO - J Med Internet Res SP - e54580 VL - 26 KW - clinical data extraction KW - large language models KW - feature hallucination KW - modular approach KW - unstructured data processing N2 - Background: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. Objective: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. Methods: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People?s Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert?s annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. Results: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. Conclusions: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records. UR - https://www.jmir.org/2024/1/e54580 UR - http://dx.doi.org/10.2196/54580 UR - http://www.ncbi.nlm.nih.gov/pubmed/38551633 ID - info:doi/10.2196/54580 ER - TY - JOUR AU - Noda, Masao AU - Ueno, Takayoshi AU - Koshu, Ryota AU - Takaso, Yuji AU - Shimada, Dias Mari AU - Saito, Chizu AU - Sugimoto, Hisashi AU - Fushiki, Hiroaki AU - Ito, Makoto AU - Nomura, Akihiro AU - Yoshizaki, Tomokazu PY - 2024/3/28 TI - Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study JO - JMIR Med Educ SP - e57054 VL - 10 KW - artificial intelligence KW - GPT-4v KW - large language model KW - otolaryngology KW - GPT KW - ChatGPT KW - LLM KW - LLMs KW - language model KW - language models KW - head KW - respiratory KW - ENT: ear KW - nose KW - throat KW - neck KW - NLP KW - natural language processing KW - image KW - images KW - exam KW - exams KW - examination KW - examinations KW - answer KW - answers KW - answering KW - response KW - responses N2 - Background: Artificial intelligence models can learn from medical literature and clinical cases and generate answers that rival human experts. However, challenges remain in the analysis of complex data containing images and diagrams. Objective: This study aims to assess the answering capabilities and accuracy of ChatGPT-4 Vision (GPT-4V) for a set of 100 questions, including image-based questions, from the 2023 otolaryngology board certification examination. Methods: Answers to 100 questions from the 2023 otolaryngology board certification examination, including image-based questions, were generated using GPT-4V. The accuracy rate was evaluated using different prompts, and the presence of images, clinical area of the questions, and variations in the answer content were examined. Results: The accuracy rate for text-only input was, on average, 24.7% but improved to 47.3% with the addition of English translation and prompts (P<.001). The average nonresponse rate for text-only input was 46.3%; this decreased to 2.7% with the addition of English translation and prompts (P<.001). The accuracy rate was lower for image-based questions than for text-only questions across all types of input, with a relatively high nonresponse rate. General questions and questions from the fields of head and neck allergies and nasal allergies had relatively high accuracy rates, which increased with the addition of translation and prompts. In terms of content, questions related to anatomy had the highest accuracy rate. For all content types, the addition of translation and prompts increased the accuracy rate. As for the performance based on image-based questions, the average of correct answer rate with text-only input was 30.4%, and that with text-plus-image input was 41.3% (P=.02). Conclusions: Examination of artificial intelligence?s answering capabilities for the otolaryngology board certification examination improves our understanding of its potential and limitations in this field. Although the improvement was noted with the addition of translation and prompts, the accuracy rate for image-based questions was lower than that for text-based questions, suggesting room for improvement in GPT-4V at this stage. Furthermore, text-plus-image input answers a higher rate in image-based questions. Our findings imply the usefulness and potential of GPT-4V in medicine; however, future consideration of safe use methods is needed. UR - https://mededu.jmir.org/2024/1/e57054 UR - http://dx.doi.org/10.2196/57054 UR - http://www.ncbi.nlm.nih.gov/pubmed/38546736 ID - info:doi/10.2196/57054 ER - TY - JOUR AU - Gandhi, P. Aravind AU - Joesph, Karen Felista AU - Rajagopal, Vineeth AU - Aparnavi, P. AU - Katkuri, Sushma AU - Dayama, Sonal AU - Satapathy, Prakasini AU - Khatib, Nazli Mahalaqua AU - Gaidhane, Shilpa AU - Zahiruddin, Syed Quazi AU - Behera, Ashish PY - 2024/3/25 TI - Performance of ChatGPT on the India Undergraduate Community Medicine Examination: Cross-Sectional Study JO - JMIR Form Res SP - e49964 VL - 8 KW - artificial intelligence KW - ChatGPT KW - community medicine KW - India KW - large language model KW - medical education KW - digitalization N2 - Background: Medical students may increasingly use large language models (LLMs) in their learning. ChatGPT is an LLM at the forefront of this new development in medical education with the capacity to respond to multidisciplinary questions. Objective: The aim of this study was to evaluate the ability of ChatGPT 3.5 to complete the Indian undergraduate medical examination in the subject of community medicine. We further compared ChatGPT scores with the scores obtained by the students. Methods: The study was conducted at a publicly funded medical college in Hyderabad, India. The study was based on the internal assessment examination conducted in January 2023 for students in the Bachelor of Medicine and Bachelor of Surgery Final Year?Part I program; the examination of focus included 40 questions (divided between two papers) from the community medicine subject syllabus. Each paper had three sections with different weightage of marks for each section: section one had two long essay?type questions worth 15 marks each, section two had 8 short essay?type questions worth 5 marks each, and section three had 10 short-answer questions worth 3 marks each. The same questions were administered as prompts to ChatGPT 3.5 and the responses were recorded. Apart from scoring ChatGPT responses, two independent evaluators explored the responses to each question to further analyze their quality with regard to three subdomains: relevancy, coherence, and completeness. Each question was scored in these subdomains on a Likert scale of 1-5. The average of the two evaluators was taken as the subdomain score of the question. The proportion of questions with a score 50% of the maximum score (5) in each subdomain was calculated. Results: ChatGPT 3.5 scored 72.3% on paper 1 and 61% on paper 2. The mean score of the 94 students was 43% on paper 1 and 45% on paper 2. The responses of ChatGPT 3.5 were also rated to be satisfactorily relevant, coherent, and complete for most of the questions (>80%). Conclusions: ChatGPT 3.5 appears to have substantial and sufficient knowledge to understand and answer the Indian medical undergraduate examination in the subject of community medicine. ChatGPT may be introduced to students to enable the self-directed learning of community medicine in pilot mode. However, faculty oversight will be required as ChatGPT is still in the initial stages of development, and thus its potential and reliability of medical content from the Indian context need to be further explored comprehensively. UR - https://formative.jmir.org/2024/1/e49964 UR - http://dx.doi.org/10.2196/49964 UR - http://www.ncbi.nlm.nih.gov/pubmed/38526538 ID - info:doi/10.2196/49964 ER - TY - JOUR AU - Arnold, Virginia AU - Purnat, D. Tina AU - Marten, Robert AU - Pattison, Andrew AU - Gouda, Hebe PY - 2024/3/21 TI - Chatbots and COVID-19: Taking Stock of the Lessons Learned JO - J Med Internet Res SP - e54840 VL - 26 KW - chatbots KW - COVID-19 KW - health KW - public health KW - pandemic KW - health care UR - https://www.jmir.org/2024/1/e54840 UR - http://dx.doi.org/10.2196/54840 UR - http://www.ncbi.nlm.nih.gov/pubmed/38512309 ID - info:doi/10.2196/54840 ER - TY - JOUR AU - Karkosz, Stanis?aw AU - Szyma?ski, Robert AU - Sanna, Katarzyna AU - Micha?owski, Jaros?aw PY - 2024/3/20 TI - Effectiveness of a Web-based and Mobile Therapy Chatbot on Anxiety and Depressive Symptoms in Subclinical Young Adults: Randomized Controlled Trial JO - JMIR Form Res SP - e47960 VL - 8 KW - chatbots KW - conversational agents KW - chatbot KW - conversational agent KW - artificial intelligence KW - mental health KW - depression KW - anxiety KW - depressive KW - cognitive distortions KW - young adults KW - randomized control trial KW - RCT KW - user experience KW - CBT KW - psychotherapy KW - cognitive behavioral therapy N2 - Background: There has been an increased need to provide specialized help for people with depressive and anxiety symptoms, particularly teenagers and young adults. There is evidence from a 2-week intervention that chatbots (eg, Woebot) are effective in reducing depression and anxiety, an effect that was not detected in the control group that was provided self-help materials. Although chatbots are a promising solution, there is limited scientific evidence for the efficacy of agent-guided cognitive behavioral therapy (CBT) outside the English language, especially for highly inflected languages. Objective: This study aimed to measure the efficacy of Fido, a therapy chatbot that uses the Polish language. It targets depressive and anxiety symptoms using CBT techniques. We hypothesized that participants using Fido would show a greater reduction in anxiety and depressive symptoms than the control group. Methods: We conducted a 2-arm, open-label, randomized controlled trial with 81 participants with subclinical depression or anxiety who were recruited via social media. Participants were divided into experimental (interacted with a fully automated Fido chatbot) and control (received a self-help book) groups. Both intervention methods addressed topics such as general psychoeducation and cognitive distortion identification and modification via Socratic questioning. The chatbot also featured suicidal ideation identification and redirection to suicide hotlines. We used self-assessment scales to measure primary outcomes, including the levels of depression, anxiety, worry tendencies, satisfaction with life, and loneliness at baseline, after the 2-week intervention and at the 1-month follow-up. We also controlled for secondary outcomes, including engagement and frequency of use. Results: There were no differences in anxiety and depressive symptoms between the groups at enrollment and baseline. After the intervention, depressive and anxiety symptoms were reduced in both groups (chatbot: n=36; control: n=38), which remained stable at the 1-month follow-up. Loneliness was not significantly different between the groups after the intervention, but an exploratory analysis showed a decline in loneliness among participants who used Fido more frequently. Both groups used their intervention technique with similar frequency; however, the control group spent more time (mean 117.57, SD 72.40 minutes) on the intervention than the Fido group (mean 79.44, SD 42.96 minutes). Conclusions: We did not replicate the findings from previous (eg, Woebot) studies, as both arms yielded therapeutic effects. However, such results are in line with other research of Internet interventions. Nevertheless, Fido provided sufficient help to reduce anxiety and depressive symptoms and decreased perceived loneliness among high-frequency users, which is one of the first pieces of evidence of chatbot efficacy with agents that use a highly inflected language. Further research is needed to determine the long-term, real-world effectiveness of Fido and its efficacy in a clinical sample. Trial Registration: ClinicalTrials.gov NCT05762939; https://clinicaltrials.gov/study/NCT05762939; Open Science Foundation Registry 2cqt3; https://osf.io/2cqt3 UR - https://formative.jmir.org/2024/1/e47960 UR - http://dx.doi.org/10.2196/47960 UR - http://www.ncbi.nlm.nih.gov/pubmed/38506892 ID - info:doi/10.2196/47960 ER - TY - JOUR AU - Yim, Dobin AU - Khuntia, Jiban AU - Parameswaran, Vijaya AU - Meyers, Arlen PY - 2024/3/20 TI - Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review JO - JMIR Med Inform SP - e52073 VL - 12 KW - generative artificial intelligence tools and applications KW - GenAI KW - service KW - clinical KW - health care KW - transformation KW - digital N2 - Background: Generative artificial intelligence tools and applications (GenAI) are being increasingly used in health care. Physicians, specialists, and other providers have started primarily using GenAI as an aid or tool to gather knowledge, provide information, train, or generate suggestive dialogue between physicians and patients or between physicians and patients? families or friends. However, unless the use of GenAI is oriented to be helpful in clinical service encounters that can improve the accuracy of diagnosis, treatment, and patient outcomes, the expected potential will not be achieved. As adoption continues, it is essential to validate the effectiveness of the infusion of GenAI as an intelligent technology in service encounters to understand the gap in actual clinical service use of GenAI. Objective: This study synthesizes preliminary evidence on how GenAI assists, guides, and automates clinical service rendering and encounters in health care The review scope was limited to articles published in peer-reviewed medical journals. Methods: We screened and selected 0.38% (161/42,459) of articles published between January 1, 2020, and May 31, 2023, identified from PubMed. We followed the protocols outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to select highly relevant studies with at least 1 element on clinical use, evaluation, and validation to provide evidence of GenAI use in clinical services. The articles were classified based on their relevance to clinical service functions or activities using the descriptive and analytical information presented in the articles. Results: Of 161 articles, 141 (87.6%) reported using GenAI to assist services through knowledge access, collation, and filtering. GenAI was used for disease detection (19/161, 11.8%), diagnosis (14/161, 8.7%), and screening processes (12/161, 7.5%) in the areas of radiology (17/161, 10.6%), cardiology (12/161, 7.5%), gastrointestinal medicine (4/161, 2.5%), and diabetes (6/161, 3.7%). The literature synthesis in this study suggests that GenAI is mainly used for diagnostic processes, improvement of diagnosis accuracy, and screening and diagnostic purposes using knowledge access. Although this solves the problem of knowledge access and may improve diagnostic accuracy, it is oriented toward higher value creation in health care. Conclusions: GenAI informs rather than assisting or automating clinical service functions in health care. There is potential in clinical service, but it has yet to be actualized for GenAI. More clinical service?level evidence that GenAI is used to streamline some functions or provides more automated help than only information retrieval is needed. To transform health care as purported, more studies related to GenAI applications must automate and guide human-performed services and keep up with the optimism that forward-thinking health care organizations will take advantage of GenAI. UR - https://medinform.jmir.org/2024/1/e52073 UR - http://dx.doi.org/10.2196/52073 UR - http://www.ncbi.nlm.nih.gov/pubmed/38506918 ID - info:doi/10.2196/52073 ER - TY - JOUR AU - Magalhães Araujo, Sabrina AU - Cruz-Correia, Ricardo PY - 2024/3/20 TI - Incorporating ChatGPT in Medical Informatics Education: Mixed Methods Study on Student Perceptions and Experiential Integration Proposals JO - JMIR Med Educ SP - e51151 VL - 10 KW - education KW - medical informatics KW - artificial intelligence KW - AI KW - generative language model KW - ChatGPT N2 - Background: The integration of artificial intelligence (AI) technologies, such as ChatGPT, in the educational landscape has the potential to enhance the learning experience of medical informatics students and prepare them for using AI in professional settings. The incorporation of AI in classes aims to develop critical thinking by encouraging students to interact with ChatGPT and critically analyze the responses generated by the chatbot. This approach also helps students develop important skills in the field of biomedical and health informatics to enhance their interaction with AI tools. Objective: The aim of the study is to explore the perceptions of students regarding the use of ChatGPT as a learning tool in their educational context and provide professors with examples of prompts for incorporating ChatGPT into their teaching and learning activities, thereby enhancing the educational experience for students in medical informatics courses. Methods: This study used a mixed methods approach to gain insights from students regarding the use of ChatGPT in education. To accomplish this, a structured questionnaire was applied to evaluate students? familiarity with ChatGPT, gauge their perceptions of its use, and understand their attitudes toward its use in academic and learning tasks. Learning outcomes of 2 courses were analyzed to propose ChatGPT?s incorporation in master?s programs in medicine and medical informatics. Results: The majority of students expressed satisfaction with the use of ChatGPT in education, finding it beneficial for various purposes, including generating academic content, brainstorming ideas, and rewriting text. While some participants raised concerns about potential biases and the need for informed use, the overall perception was positive. Additionally, the study proposed integrating ChatGPT into 2 specific courses in the master?s programs in medicine and medical informatics. The incorporation of ChatGPT was envisioned to enhance student learning experiences and assist in project planning, programming code generation, examination preparation, workflow exploration, and technical interview preparation, thus advancing medical informatics education. In medical teaching, it will be used as an assistant for simplifying the explanation of concepts and solving complex problems, as well as for generating clinical narratives and patient simulators. Conclusions: The study?s valuable insights into medical faculty students? perspectives and integration proposals for ChatGPT serve as an informative guide for professors aiming to enhance medical informatics education. The research delves into the potential of ChatGPT, emphasizes the necessity of collaboration in academic environments, identifies subject areas with discernible benefits, and underscores its transformative role in fostering innovative and engaging learning experiences. The envisaged proposals hold promise in empowering future health care professionals to work in the rapidly evolving era of digital health care. UR - https://mededu.jmir.org/2024/1/e51151 UR - http://dx.doi.org/10.2196/51151 UR - http://www.ncbi.nlm.nih.gov/pubmed/38506920 ID - info:doi/10.2196/51151 ER - TY - JOUR AU - Xue, Zhaowen AU - Zhang, Yiming AU - Gan, Wenyi AU - Wang, Huajun AU - She, Guorong AU - Zheng, Xiaofei PY - 2024/3/14 TI - Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis JO - J Med Internet Res SP - e50882 VL - 26 KW - artificial intelligence KW - ChatGPT KW - consultation KW - musculoskeletal KW - natural language processing KW - remote medical consultation KW - orthopaedic KW - orthopaedics N2 - Background: The widespread use of artificial intelligence, such as ChatGPT (OpenAI), is transforming sectors, including health care, while separate advancements of the internet have enabled platforms such as China?s DingXiangYuan to offer remote medical services. Objective: This study evaluates ChatGPT-4?s responses against those of professional health care providers in telemedicine, assessing artificial intelligence?s capability to support the surge in remote medical consultations and its impact on health care delivery. Methods: We sourced remote orthopedic consultations from ?Doctor DingXiang,? with responses from its certified physicians as the control and ChatGPT?s responses as the experimental group. In all, 3 blindfolded, experienced orthopedic surgeons assessed responses against 7 criteria: ?logical reasoning,? ?internal information,? ?external information,? ?guiding function,? ?therapeutic effect,? ?medical knowledge popularization education,? and ?overall satisfaction.? We used Fleiss ? to measure agreement among multiple raters. Results: Initially, consultation records for a cumulative count of 8 maladies (equivalent to 800 cases) were gathered. We ultimately included 73 consultation records by May 2023, following primary and rescreening, in which no communication records containing private information, images, or voice messages were transmitted. After statistical scoring, we discovered that ChatGPT?s ?internal information? score (mean 4.61, SD 0.52 points vs mean 4.66, SD 0.49 points; P=.43) and ?therapeutic effect? score (mean 4.43, SD 0.75 points vs mean 4.55, SD 0.62 points; P=.32) were lower than those of the control group, but the differences were not statistically significant. ChatGPT showed better performance with a higher ?logical reasoning? score (mean 4.81, SD 0.36 points vs mean 4.75, SD 0.39 points; P=.38), ?external information? score (mean 4.06, SD 0.72 points vs mean 3.92, SD 0.77 points; P=.25), and ?guiding function? score (mean 4.73, SD 0.51 points vs mean 4.72, SD 0.54 points; P=.96), although the differences were not statistically significant. Meanwhile, the ?medical knowledge popularization education? score of ChatGPT was better than that of the control group (mean 4.49, SD 0.67 points vs mean 3.87, SD 1.01 points; P<.001), and the difference was statistically significant. In terms of ?overall satisfaction,? the difference was not statistically significant between the groups (mean 8.35, SD 1.38 points vs mean 8.37, SD 1.24 points; P=.92). According to how Fleiss ? values were interpreted, 6 of the control group?s score points were classified as displaying ?fair agreement? (P<.001), and 1 was classified as showing ?substantial agreement? (P<.001). In the experimental group, 3 points were classified as indicating ?fair agreement,? while 4 suggested ?moderate agreement? (P<.001). Conclusions: ChatGPT-4 matches the expertise found in DingXiangYuan forums? paid consultations, excelling particularly in scientific education. It presents a promising alternative for remote health advice. For health care professionals, it could act as an aid in patient education, while patients may use it as a convenient tool for health inquiries. UR - https://www.jmir.org/2024/1/e50882 UR - http://dx.doi.org/10.2196/50882 UR - http://www.ncbi.nlm.nih.gov/pubmed/38483451 ID - info:doi/10.2196/50882 ER - TY - JOUR AU - Shidara, Kazuhiro AU - Tanaka, Hiroki AU - Adachi, Hiroyoshi AU - Kanayama, Daisuke AU - Kudo, Takashi AU - Nakamura, Satoshi PY - 2024/3/14 TI - Adapting the Number of Questions Based on Detected Psychological Distress for Cognitive Behavioral Therapy With an Embodied Conversational Agent: Comparative Study JO - JMIR Form Res SP - e50056 VL - 8 KW - cognitive behavioral therapy KW - psychological distress detection KW - embodied conversational agents KW - automatic thoughts KW - long short-term memory KW - multitask learning N2 - Background: The high prevalence of mental illness is a critical social problem. The limited availability of mental health services is a major factor that exacerbates this problem. One solution is to deliver cognitive behavioral therapy (CBT) using an embodied conversational agent (ECA). ECAs make it possible to provide health care without location or time constraints. One of the techniques used in CBT is Socratic questioning, which guides users to correct negative thoughts. The effectiveness of this approach depends on a therapist?s skill to adapt to the user?s mood or distress level. However, current ECAs do not possess this skill. Therefore, it is essential to implement this adaptation ability to the ECAs. Objective: This study aims to develop and evaluate a method that automatically adapts the number of Socratic questions based on the level of detected psychological distress during a CBT session with an ECA. We hypothesize that this adaptive approach to selecting the number of questions will lower psychological distress, reduce negative emotional states, and produce more substantial cognitive changes compared with a random number of questions. Methods: In this study, which envisions health care support in daily life, we recruited participants aged from 18 to 65 years for an experiment that involved 2 different conditions: an ECA that adapts a number of questions based on psychological distress detection or an ECA that only asked a random number of questions. The participants were assigned to 1 of the 2 conditions, experienced a single CBT session with an ECA, and completed questionnaires before and after the session. Results: The participants completed the experiment. There were slight differences in sex, age, and preexperimental psychological distress levels between the 2 conditions. The adapted number of questions condition showed significantly lower psychological distress than the random number of questions condition after the session. We also found a significant difference in the cognitive change when the number of questions was adapted based on the detected distress level, compared with when the number of questions was fewer than what was appropriate for the level of distress detected. Conclusions: The results show that an ECA adapting the number of Socratic questions based on detected distress levels increases the effectiveness of CBT. Participants who received an adaptive number of questions experienced greater reductions in distress than those who received a random number of questions. In addition, the participants showed a greater amount of cognitive change when the number of questions matched the detected distress level. This suggests that adapting the question quantity based on distress level detection can improve the results of CBT delivered by an ECA. These results illustrate the advantages of ECAs, paving the way for mental health care that is more tailored and effective. UR - https://formative.jmir.org/2024/1/e50056 UR - http://dx.doi.org/10.2196/50056 UR - http://www.ncbi.nlm.nih.gov/pubmed/38483464 ID - info:doi/10.2196/50056 ER - TY - JOUR AU - Cirone, Katrina AU - Akrout, Mohamed AU - Abid, Latif AU - Oakley, Amanda PY - 2024/3/13 TI - Assessing the Utility of Multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in Identifying Melanoma Across Different Skin Tones JO - JMIR Dermatol SP - e55508 VL - 7 KW - melanoma KW - nevus KW - skin pigmentation KW - artificial intelligence KW - AI KW - multimodal large language models KW - large language model KW - large language models KW - LLM KW - LLMs KW - machine learning KW - expert systems KW - natural language processing KW - NLP KW - GPT KW - GPT-4V KW - dermatology KW - skin KW - lesion KW - lesions KW - cancer KW - oncology KW - visual UR - https://derma.jmir.org/2024/1/e55508 UR - http://dx.doi.org/10.2196/55508 UR - http://www.ncbi.nlm.nih.gov/pubmed/38477960 ID - info:doi/10.2196/55508 ER - TY - JOUR AU - Chou, Ya-Hsin AU - Lin, Chemin AU - Lee, Shwu-Hua AU - Lee, Yen-Fen AU - Cheng, Li-Chen PY - 2024/3/13 TI - User-Friendly Chatbot to Mitigate the Psychological Stress of Older Adults During the COVID-19 Pandemic: Development and Usability Study JO - JMIR Form Res SP - e49462 VL - 8 KW - geriatric psychiatry KW - mental health KW - loneliness KW - chatbot KW - user experience KW - health promotion KW - older adults KW - technology-assisted interventions KW - pandemic KW - lonely KW - gerontology KW - elderly KW - develop KW - design KW - development KW - conversational agent KW - geriatric KW - geriatrics KW - psychiatry N2 - Background: To safeguard the most vulnerable individuals during the COVID-19 pandemic, numerous governments enforced measures such as stay-at-home orders, social distancing, and self-isolation. These social restrictions had a particularly negative effect on older adults, as they are more vulnerable and experience increased loneliness, which has various adverse effects, including increasing the risk of mental health problems and mortality. Chatbots can potentially reduce loneliness and provide companionship during a pandemic. However, existing chatbots do not cater to the specific needs of older adult populations. Objective: We aimed to develop a user-friendly chatbot tailored to the specific needs of older adults with anxiety or depressive disorders during the COVID-19 pandemic and to examine their perspectives on mental health chatbot use. The primary research objective was to investigate whether chatbots can mitigate the psychological stress of older adults during COVID-19. Methods: Participants were older adults belonging to two age groups (?65 years and <65 years) from a psychiatric outpatient department who had been diagnosed with depressive or anxiety disorders by certified psychiatrists according to the Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition) (DSM-5) criteria. The participants were required to use mobile phones, have internet access, and possess literacy skills. The chatbot?s content includes monitoring and tracking health data and providing health information. Participants had access to the chatbot for at least 4 weeks. Self-report questionnaires for loneliness, depression, and anxiety were administered before and after chatbot use. The participants also rated their attitudes toward the chatbot. Results: A total of 35 participants (mean age 65.21, SD 7.51 years) were enrolled in the trial, comprising 74% (n=26) female and 26% (n=9) male participants. The participants demonstrated a high utilization rate during the intervention, with over 82% engaging with the chatbot daily. Loneliness significantly improved in the older group ?65 years. This group also responded positively to the chatbot, as evidenced by changes in University of California Los Angeles Loneliness Scale scores, suggesting that this demographic can derive benefits from chatbot interaction. Conversely, the younger group, <65 years, exhibited no significant changes in loneliness after the intervention. Both the older and younger age groups provided good scores in relation to chatbot design with respect to usability (mean scores of 6.33 and 6.05, respectively) and satisfaction (mean scores of 5.33 and 5.15, respectively), rated on a 7-point Likert scale. Conclusions: The chatbot interface was found to be user-friendly and demonstrated promising results among participants 65 years and older who were receiving care at psychiatric outpatient clinics and experiencing relatively stable symptoms of depression and anxiety. The chatbot not only provided caring companionship but also showed the potential to alleviate loneliness during the challenging circumstances of a pandemic. UR - https://formative.jmir.org/2024/1/e49462 UR - http://dx.doi.org/10.2196/49462 UR - http://www.ncbi.nlm.nih.gov/pubmed/38477965 ID - info:doi/10.2196/49462 ER - TY - JOUR AU - Nakao, Takahiro AU - Miki, Soichiro AU - Nakamura, Yuta AU - Kikuchi, Tomohiro AU - Nomura, Yukihiro AU - Hanaoka, Shouhei AU - Yoshikawa, Takeharu AU - Abe, Osamu PY - 2024/3/12 TI - Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study JO - JMIR Med Educ SP - e54393 VL - 10 KW - AI KW - artificial intelligence KW - LLM KW - large language model KW - language model KW - language models KW - ChatGPT KW - GPT-4 KW - GPT-4V KW - generative pretrained transformer KW - image KW - images KW - imaging KW - response KW - responses KW - exam KW - examination KW - exams KW - examinations KW - answer KW - answers KW - NLP KW - natural language processing KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - medical education N2 - Background: Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. Objective: We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. Methods: We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. Results: Among the 108 questions with images, GPT-4V?s accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P?.99), respectively. Conclusions: The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination. UR - https://mededu.jmir.org/2024/1/e54393 UR - http://dx.doi.org/10.2196/54393 UR - http://www.ncbi.nlm.nih.gov/pubmed/38470459 ID - info:doi/10.2196/54393 ER - TY - JOUR AU - Chen, Yan AU - Esmaeilzadeh, Pouyan PY - 2024/3/8 TI - Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges JO - J Med Internet Res SP - e53008 VL - 26 KW - artificial intelligence KW - AI KW - generative artificial intelligence KW - generative AI KW - medical practices KW - potential benefits KW - security and privacy threats UR - https://www.jmir.org/2024/1/e53008 UR - http://dx.doi.org/10.2196/53008 UR - http://www.ncbi.nlm.nih.gov/pubmed/38457208 ID - info:doi/10.2196/53008 ER - TY - JOUR AU - Davis, Joshua AU - Van Bulck, Liesbet AU - Durieux, N. Brigitte AU - Lindvall, Charlotta PY - 2024/3/8 TI - The Temperature Feature of ChatGPT: Modifying Creativity for Clinical Research JO - JMIR Hum Factors SP - e53559 VL - 11 KW - artificial intelligence KW - ChatGPT KW - clinical communication KW - creative KW - creativity KW - customization KW - customize KW - customized KW - generation KW - generative KW - language model KW - language models KW - LLM KW - LLMs KW - natural language processing KW - NLP KW - random KW - randomness KW - tailor KW - tailored KW - temperature KW - text KW - texts KW - textual UR - https://humanfactors.jmir.org/2024/1/e53559 UR - http://dx.doi.org/10.2196/53559 UR - http://www.ncbi.nlm.nih.gov/pubmed/38457221 ID - info:doi/10.2196/53559 ER - TY - JOUR AU - Rodriguez, V. Danissa AU - Lawrence, Katharine AU - Gonzalez, Javier AU - Brandfield-Harvey, Beatrix AU - Xu, Lynn AU - Tasneem, Sumaiya AU - Levine, L. Defne AU - Mann, Devin PY - 2024/3/6 TI - Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study JO - JMIR Hum Factors SP - e52885 VL - 11 KW - digital health KW - GenAI KW - generative KW - artificial intelligence KW - ChatGPT KW - software engineering KW - mHealth KW - mobile health KW - app KW - apps KW - application KW - applications KW - diabetes KW - diabetic KW - diabetes prevention KW - digital prescription KW - software KW - engagement KW - behaviour change KW - behavior change KW - developer KW - developers KW - LLM KW - LLMs KW - language model KW - language models KW - NLP KW - natural language processing N2 - Background: Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting. Objective: This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program. Methods: We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency. Results: Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies. Conclusions: ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation. Trial Registration: ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500 UR - https://humanfactors.jmir.org/2024/1/e52885 UR - http://dx.doi.org/10.2196/52885 UR - http://www.ncbi.nlm.nih.gov/pubmed/38446539 ID - info:doi/10.2196/52885 ER - TY - JOUR AU - Yoon, Sungwon AU - Tang, Haoming AU - Tan, Min Chao AU - Phang, Kie Jie AU - Kwan, Heng Yu AU - Low, Leng Lian PY - 2024/3/6 TI - Acceptability of Mobile App?Based Motivational Interviewing and Preferences for App Features to Support Self-Management in Patients With Type 2 Diabetes: Qualitative Study JO - JMIR Diabetes SP - e48310 VL - 9 KW - mobile health KW - motivational interviewing KW - diabetes KW - self-management KW - health coaching KW - acceptability KW - application KW - management KW - type 2 diabetes KW - communication KW - patient barrier KW - healthy behavior KW - feedback KW - visualization KW - hybrid model N2 - Background: Patients with type 2 diabetes mellitus (T2DM) experience multiple barriers to improving self-management. Evidence suggests that motivational interviewing (MI), a patient-centered communication method, can address patient barriers and promote healthy behavior. Despite the value of MI, existing MI studies predominantly used face-to-face or phone-based interventions. With the growing adoption of smartphones, automated MI techniques powered by artificial intelligence on mobile devices may offer effective motivational support to patients with T2DM. Objective: This study aimed to explore the perspectives of patients with T2DM on the acceptability of app-based MI in routine health care and collect their feedback on specific MI module features to inform our future intervention. Methods: We conducted semistructured interviews with patients with T2DM, recruited from public primary care clinics. All interviews were audio recorded and transcribed verbatim. Thematic analysis was conducted using NVivo. Results: In total, 33 patients with T2DM participated in the study. Participants saw MI as a mental reminder to increase motivation and a complementary care model conducive to self-reflection and behavior change. Yet, there was a sense of reluctance, mainly stemming from potential compromise of autonomy in self-care by the introduction of MI. Some participants felt confident in their ability to manage conditions independently, while others reported already making changes and preferred self-management at their own pace. Compared with in-person MI, app-based MI was viewed as offering a more relaxed atmosphere for open sharing without being judged by health care providers. However, participants questioned the lack of human touch, which could potentially undermine a patient-provider therapeutic relationship. To sustain motivation, participants suggested more features of an ongoing supportive nature such as the visualization of milestones, gamified challenges and incremental rewards according to achievements, tailored multimedia resources based on goals, and conversational tools that are interactive and empathic. Conclusions: Our findings suggest the need for a hybrid model of intervention involving both app-based automated MI and human coaching. Patient feedback on specific app features will be incorporated into the module development and tested in a randomized controlled trial. UR - https://diabetes.jmir.org/2024/1/e48310 UR - http://dx.doi.org/10.2196/48310 UR - http://www.ncbi.nlm.nih.gov/pubmed/38446526 ID - info:doi/10.2196/48310 ER - TY - JOUR AU - Roster, Katie AU - Kann, B. Rebecca AU - Farabi, Banu AU - Gronbeck, Christian AU - Brownstone, Nicholas AU - Lipner, R. Shari PY - 2024/3/6 TI - Readability and Health Literacy Scores for ChatGPT-Generated Dermatology Public Education Materials: Cross-Sectional Analysis of Sunscreen and Melanoma Questions JO - JMIR Dermatol SP - e50163 VL - 7 KW - ChatGPT KW - artificial intelligence KW - AI KW - LLM KW - LLMs KW - large language model KW - language model KW - language models KW - generative KW - NLP KW - natural language processing KW - health disparities KW - health literacy KW - readability KW - disparities KW - disparity KW - dermatology KW - health information KW - comprehensible KW - comprehensibility KW - understandability KW - patient education KW - public education KW - health education KW - online information UR - https://derma.jmir.org/2024/1/e50163 UR - http://dx.doi.org/10.2196/50163 UR - http://www.ncbi.nlm.nih.gov/pubmed/38446502 ID - info:doi/10.2196/50163 ER - TY - JOUR AU - Reynolds, Kelly AU - Tejasvi, Trilokraj PY - 2024/3/6 TI - Potential Use of ChatGPT in Responding to Patient Questions and Creating Patient Resources JO - JMIR Dermatol SP - e48451 VL - 7 KW - artificial intelligence KW - AI KW - ChatGPT KW - patient resources KW - patient handouts KW - natural language processing software KW - language model KW - language models KW - natural language processing KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - patient education KW - educational resource KW - educational UR - https://derma.jmir.org/2024/1/e48451 UR - http://dx.doi.org/10.2196/48451 UR - http://www.ncbi.nlm.nih.gov/pubmed/38446541 ID - info:doi/10.2196/48451 ER - TY - JOUR AU - Kaplan, M. Deanna AU - Palitsky, Roman AU - Arconada Alvarez, J. Santiago AU - Pozzo, S. Nicole AU - Greenleaf, N. Morgan AU - Atkinson, A. Ciara AU - Lam, A. Wilbur PY - 2024/3/5 TI - What?s in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT JO - J Med Internet Res SP - e51837 VL - 26 KW - chatbot KW - generative artificial intelligence KW - generative AI KW - gender bias KW - large language models KW - letters of recommendation KW - recommendation letter KW - language model KW - chatbots KW - artificial intelligence KW - AI KW - gender-based language KW - human written KW - real-world KW - scenario N2 - Background: Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT?s underlying language model a serious concern. Objective: Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human-use sessions (N=1400 total letters). Methods: We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular ?male? and ?female? names in the United States. Study 1 used 3 different letter-writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between-prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender-stereotyped professional accomplishments, ChatGPT would evidence gender-based language differences replicating those found in systematic reviews of human-written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill-based language for male names). Results: Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt-raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. Conclusions: ChatGPT reproduces many gender-based language biases that have been reliably identified in investigations of human-written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real-world scenarios. Trial Registration: OSF Registries osf.io/ztv96; https://osf.io/ztv96 UR - https://www.jmir.org/2024/1/e51837 UR - http://dx.doi.org/10.2196/51837 UR - http://www.ncbi.nlm.nih.gov/pubmed/38441945 ID - info:doi/10.2196/51837 ER - TY - JOUR AU - Deiner, S. Michael AU - Deiner, A. Natalie AU - Hristidis, Vagelis AU - McLeod, D. Stephen AU - Doan, Thuy AU - Lietman, M. Thomas AU - Porco, C. Travis PY - 2024/3/1 TI - Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study JO - J Med Internet Res SP - e49139 VL - 26 KW - conjunctivitis KW - microblog KW - social media KW - generative large language model KW - Generative Pre-trained Transformers KW - GPT-3.5 KW - GPT-4 KW - epidemic detection KW - Twitter KW - X formerly known as Twitter KW - infectious eye disease N2 - Background: Previous work suggests that Google searches could be useful in identifying conjunctivitis epidemics. Content-based assessment of social media content may provide additional value in serving as early indicators of conjunctivitis and other systemic infectious diseases. Objective: We investigated whether large language models, specifically GPT-3.5 and GPT-4 (OpenAI), can provide probabilistic assessments of whether social media posts about conjunctivitis could indicate a regional outbreak. Methods: A total of 12,194 conjunctivitis-related tweets were obtained using a targeted Boolean search in multiple languages from India, Guam (United States), Martinique (France), the Philippines, American Samoa (United States), Fiji, Costa Rica, Haiti, and the Bahamas, covering the time frame from January 1, 2012, to March 13, 2023. By providing these tweets via prompts to GPT-3.5 and GPT-4, we obtained probabilistic assessments that were validated by 2 human raters. We then calculated Pearson correlations of these time series with tweet volume and the occurrence of known outbreaks in these 9 locations, with time series bootstrap used to compute CIs. Results: Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI 0.47-0.70) and 0.53 (95% CI 0.40-0.65) with the 2 human raters, with higher results for GPT-4. The weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly tweet volume for 44% (4/9) of the countries, with correlations ranging from 0.10 (95% CI 0.0-0.29) to 0.53 (95% CI 0.39-0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40, 95% CI 0.16-0.81). Conclusions: These findings suggest that GPT prompting can efficiently assess the content of social media posts and indicate possible disease outbreaks to a degree of accuracy comparable to that of humans. Furthermore, we found that automated content analysis of tweets is related to tweet volume for conjunctivitis-related posts in some locations and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for disease outbreak detection. UR - https://www.jmir.org/2024/1/e49139 UR - http://dx.doi.org/10.2196/49139 UR - http://www.ncbi.nlm.nih.gov/pubmed/38427404 ID - info:doi/10.2196/49139 ER - TY - JOUR AU - Willms, Amanda AU - Liu, Sam PY - 2024/2/29 TI - Exploring the Feasibility of Using ChatGPT to Create Just-in-Time Adaptive Physical Activity mHealth Intervention Content: Case Study JO - JMIR Med Educ SP - e51426 VL - 10 KW - ChatGPT KW - digital health KW - mobile health KW - mHealth KW - physical activity KW - application KW - mobile app KW - mobile apps KW - content creation KW - behavior change KW - app design N2 - Background: Achieving physical activity (PA) guidelines? recommendation of 150 minutes of moderate-to-vigorous PA per week has been shown to reduce the risk of many chronic conditions. Despite the overwhelming evidence in this field, PA levels remain low globally. By creating engaging mobile health (mHealth) interventions through strategies such as just-in-time adaptive interventions (JITAIs) that are tailored to an individual?s dynamic state, there is potential to increase PA levels. However, generating personalized content can take a long time due to various versions of content required for the personalization algorithms. ChatGPT presents an incredible opportunity to rapidly produce tailored content; however, there is a lack of studies exploring its feasibility. Objective: This study aimed to (1) explore the feasibility of using ChatGPT to create content for a PA JITAI mobile app and (2) describe lessons learned and future recommendations for using ChatGPT in the development of mHealth JITAI content. Methods: During phase 1, we used Pathverse, a no-code app builder, and ChatGPT to develop a JITAI app to help parents support their child?s PA levels. The intervention was developed based on the Multi-Process Action Control (M-PAC) framework, and the necessary behavior change techniques targeting the M-PAC constructs were implemented in the app design to help parents support their child?s PA. The acceptability of using ChatGPT for this purpose was discussed to determine its feasibility. In phase 2, we summarized the lessons we learned during the JITAI content development process using ChatGPT and generated recommendations to inform future similar use cases. Results: In phase 1, by using specific prompts, we efficiently generated content for 13 lessons relating to increasing parental support for their child?s PA following the M-PAC framework. It was determined that using ChatGPT for this case study to develop PA content for a JITAI was acceptable. In phase 2, we summarized our recommendations into the following six steps when using ChatGPT to create content for mHealth behavior interventions: (1) determine target behavior, (2) ground the intervention in behavior change theory, (3) design the intervention structure, (4) input intervention structure and behavior change constructs into ChatGPT, (5) revise the ChatGPT response, and (6) customize the response to be used in the intervention. Conclusions: ChatGPT offers a remarkable opportunity for rapid content creation in the context of an mHealth JITAI. Although our case study demonstrated that ChatGPT was acceptable, it is essential to approach its use, along with other language models, with caution. Before delivering content to population groups, expert review is crucial to ensure accuracy and relevancy. Future research and application of these guidelines are imperative as we deepen our understanding of ChatGPT and its interactions with human input. UR - https://mededu.jmir.org/2024/1/e51426 UR - http://dx.doi.org/10.2196/51426 UR - http://www.ncbi.nlm.nih.gov/pubmed/38421689 ID - info:doi/10.2196/51426 ER - TY - JOUR AU - Jabir, Ishqi Ahmad AU - Lin, Xiaowen AU - Martinengo, Laura AU - Sharp, Gemma AU - Theng, Yin-Leng AU - Tudor Car, Lorainne PY - 2024/2/27 TI - Attrition in Conversational Agent?Delivered Mental Health Interventions: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e48168 VL - 26 KW - conversational agent KW - chatbot KW - mental health KW - mHealth KW - attrition KW - dropout KW - mobile phone KW - artificial intelligence KW - AI KW - systematic review KW - meta-analysis KW - digital health interventions N2 - Background: Conversational agents (CAs) or chatbots are computer programs that mimic human conversation. They have the potential to improve access to mental health interventions through automated, scalable, and personalized delivery of psychotherapeutic content. However, digital health interventions, including those delivered by CAs, often have high attrition rates. Identifying the factors associated with attrition is critical to improving future clinical trials. Objective: This review aims to estimate the overall and differential rates of attrition in CA-delivered mental health interventions (CA interventions), evaluate the impact of study design and intervention-related aspects on attrition, and describe study design features aimed at reducing or mitigating study attrition. Methods: We searched PubMed, Embase (Ovid), PsycINFO (Ovid), Cochrane Central Register of Controlled Trials, and Web of Science, and conducted a gray literature search on Google Scholar in June 2022. We included randomized controlled trials that compared CA interventions against control groups and excluded studies that lasted for 1 session only and used Wizard of Oz interventions. We also assessed the risk of bias in the included studies using the Cochrane Risk of Bias Tool 2.0. Random-effects proportional meta-analysis was applied to calculate the pooled dropout rates in the intervention groups. Random-effects meta-analysis was used to compare the attrition rate in the intervention groups with that in the control groups. We used a narrative review to summarize the findings. Results: The systematic search retrieved 4566 records from peer-reviewed databases and citation searches, of which 41 (0.90%) randomized controlled trials met the inclusion criteria. The meta-analytic overall attrition rate in the intervention group was 21.84% (95% CI 16.74%-27.36%; I2=94%). Short-term studies that lasted ?8 weeks showed a lower attrition rate (18.05%, 95% CI 9.91%- 27.76%; I2=94.6%) than long-term studies that lasted >8 weeks (26.59%, 95% CI 20.09%-33.63%; I2=93.89%). Intervention group participants were more likely to attrit than control group participants for short-term (log odds ratio 1.22, 95% CI 0.99-1.50; I2=21.89%) and long-term studies (log odds ratio 1.33, 95% CI 1.08-1.65; I2=49.43%). Intervention-related characteristics associated with higher attrition include stand-alone CA interventions without human support, not having a symptom tracker feature, no visual representation of the CA, and comparing CA interventions with waitlist controls. No participant-level factor reliably predicted attrition. Conclusions: Our results indicated that approximately one-fifth of the participants will drop out from CA interventions in short-term studies. High heterogeneities made it difficult to generalize the findings. Our results suggested that future CA interventions should adopt a blended design with human support, use symptom tracking, compare CA intervention groups against active controls rather than waitlist controls, and include a visual representation of the CA to reduce the attrition rate. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42022341415; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022341415 UR - https://www.jmir.org/2024/1/e48168 UR - http://dx.doi.org/10.2196/48168 UR - http://www.ncbi.nlm.nih.gov/pubmed/38412023 ID - info:doi/10.2196/48168 ER - TY - JOUR AU - Mills, Rhiana AU - Mangone, Rose Emily AU - Lesh, Neal AU - Jayal, Gayatri AU - Mohan, Diwakar AU - Baraitser, Paula PY - 2024/2/27 TI - Chatbots That Deliver Contraceptive Support: Systematic Review JO - J Med Internet Res SP - e46758 VL - 26 KW - chatbot KW - contraceptives KW - digital health KW - AI KW - systematic review KW - conversational agent KW - development best practices KW - development KW - counseling KW - communication KW - user feedback KW - users KW - feedback KW - attitudes KW - behavior N2 - Background: A chatbot is a computer program that is designed to simulate conversation with humans. Chatbots may offer rapid, responsive, and private contraceptive information; counseling; and linkages to products and services, which could improve contraceptive knowledge, attitudes, and behaviors. Objective: This review aimed to systematically collate and interpret evidence to determine whether and how chatbots improve contraceptive knowledge, attitudes, and behaviors. Contraceptive knowledge, attitudes, and behaviors include access to contraceptive information, understanding of contraceptive information, access to contraceptive services, contraceptive uptake, contraceptive continuation, and contraceptive communication or negotiation skills. A secondary aim of the review is to identify and summarize best practice recommendations for chatbot development to improve contraceptive outcomes, including the cost-effectiveness of chatbots where evidence is available. Methods: We systematically searched peer-reviewed and gray literature (2010-2022) for papers that evaluated chatbots offering contraceptive information and services. Sources were included if they featured a chatbot and addressed an element of contraception, for example, uptake of hormonal contraceptives. Literature was assessed for methodological quality using appropriate quality assessment tools. Data were extracted from the included sources using a data extraction framework. A narrative synthesis approach was used to collate qualitative evidence as quantitative evidence was too sparse for a quantitative synthesis to be carried out. Results: We identified 15 sources, including 8 original research papers and 7 gray literature papers. These sources included 16 unique chatbots. This review found the following evidence on the impact and efficacy of chatbots: a large, robust randomized controlled trial suggests that chatbots have no effect on intention to use contraception; a small, uncontrolled cohort study suggests increased uptake of contraception among adolescent girls; and a development report, using poor-quality methods, suggests no impact on improved access to services. There is also poor-quality evidence to suggest increased contraceptive knowledge from interacting with chatbot content. User engagement was mixed, with some chatbots reaching wide audiences and others reaching very small audiences. User feedback suggests that chatbots may be experienced as acceptable, convenient, anonymous, and private, but also as incompetent, inconvenient, and unsympathetic. The best practice guidance on the development of chatbots to improve contraceptive knowledge, attitudes, and behaviors is consistent with that in the literature on chatbots in other health care fields. Conclusions: We found limited and conflicting evidence on chatbots to improve contraceptive knowledge, attitudes, and behaviors. Further research that examines the impact of chatbot interventions in comparison with alternative technologies, acknowledges the varied and changing nature of chatbot interventions, and seeks to identify key features associated with improved contraceptive outcomes is needed. The limitations of this review include the limited evidence available on this topic, the lack of formal evaluation of chatbots in this field, and the lack of standardized definition of what a chatbot is. UR - https://www.jmir.org/2024/1/e46758 UR - http://dx.doi.org/10.2196/46758 UR - http://www.ncbi.nlm.nih.gov/pubmed/38412028 ID - info:doi/10.2196/46758 ER - TY - JOUR AU - Hakam, Tarek Hassan AU - Prill, Robert AU - Korte, Lisa AU - Lovrekovi?, Bruno AU - Ostoji?, Marko AU - Ramadanov, Nikolai AU - Muehlensiepen, Felix PY - 2024/2/16 TI - Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis JO - JMIR Form Res SP - e52164 VL - 8 KW - artificial intelligence KW - AI KW - large language model KW - LLM KW - research KW - orthopedic surgery KW - sports medicine KW - orthopedics KW - surgery KW - orthopedic KW - qualitative study KW - medical database KW - feedback KW - detection KW - tool KW - scientific integrity KW - study design N2 - Background: As large language models (LLMs) are becoming increasingly integrated into different aspects of health care, questions about the implications for medical academic literature have begun to emerge. Key aspects such as authenticity in academic writing are at stake with artificial intelligence (AI) generating highly linguistically accurate and grammatically sound texts. Objective: The objective of this study is to compare human-written with AI-generated scientific literature in orthopedics and sports medicine. Methods: Five original abstracts were selected from the PubMed database. These abstracts were subsequently rewritten with the assistance of 2 LLMs with different degrees of proficiency. Subsequently, researchers with varying degrees of expertise and with different areas of specialization were asked to rank the abstracts according to linguistic and methodological parameters. Finally, researchers had to classify the articles as AI generated or human written. Results: Neither the researchers nor the AI-detection software could successfully identify the AI-generated texts. Furthermore, the criteria previously suggested in the literature did not correlate with whether the researchers deemed a text to be AI generated or whether they judged the article correctly based on these parameters. Conclusions: The primary finding of this study was that researchers were unable to distinguish between LLM-generated and human-written texts. However, due to the small sample size, it is not possible to generalize the results of this study. As is the case with any tool used in academic research, the potential to cause harm can be mitigated by relying on the transparency and integrity of the researchers. With scientific integrity at stake, further research with a similar study design should be conducted to determine the magnitude of this issue. UR - https://formative.jmir.org/2024/1/e52164 UR - http://dx.doi.org/10.2196/52164 UR - http://www.ncbi.nlm.nih.gov/pubmed/38363631 ID - info:doi/10.2196/52164 ER - TY - JOUR AU - Sallam, Malik AU - Barakat, Muna AU - Sallam, Mohammed PY - 2024/2/15 TI - A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence?Based Models in Health Care Education and Practice: Development Study Involving a Literature Review JO - Interact J Med Res SP - e54704 VL - 13 KW - guidelines KW - evaluation KW - meaningful analytics KW - large language models KW - decision support N2 - Background: Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. Objective: This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. Methods: A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with ?ChatGPT,? ?Bing,? or ?Bard? in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen ? was used as the method to evaluate the interrater reliability. Results: The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen ? of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the ?Model? item, followed by the ?Specificity? item, while the lowest scores were recorded for the ?Randomization? item (classified as suboptimal) and ?Individual factors? item (classified as satisfactory). Conclusions: The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic. UR - https://www.i-jmr.org/2024/1/e54704 UR - http://dx.doi.org/10.2196/54704 UR - http://www.ncbi.nlm.nih.gov/pubmed/38276872 ID - info:doi/10.2196/54704 ER - TY - JOUR AU - Ni, Zhao AU - Peng, L. Mary AU - Balakrishnan, Vimala AU - Tee, Vincent AU - Azwa, Iskandar AU - Saifi, Rumana AU - Nelson, E. LaRon AU - Vlahov, David AU - Altice, L. Frederick PY - 2024/2/15 TI - Implementation of Chatbot Technology in Health Care: Protocol for a Bibliometric Analysis JO - JMIR Res Protoc SP - e54349 VL - 13 KW - artificial intelligence KW - AI KW - bibliometric analysis KW - chatbots KW - health care KW - health promotion N2 - Background: Chatbots have the potential to increase people?s access to quality health care. However, the implementation of chatbot technology in the health care system is unclear due to the scarce analysis of publications on the adoption of chatbot in health and medical settings. Objective: This paper presents a protocol of a bibliometric analysis aimed at offering the public insights into the current state and emerging trends in research related to the use of chatbot technology for promoting health. Methods: In this bibliometric analysis, we will select published papers from the databases of CINAHL, IEEE Xplore, PubMed, Scopus, and Web of Science that pertain to chatbot technology and its applications in health care. Our search strategy includes keywords such as ?chatbot,? ?virtual agent,? ?virtual assistant,? ?conversational agent,? ?conversational AI,? ?interactive agent,? ?health,? and ?healthcare.? Five researchers who are AI engineers and clinicians will independently review the titles and abstracts of selected papers to determine their eligibility for a full-text review. The corresponding author (ZN) will serve as a mediator to address any discrepancies and disputes among the 5 reviewers. Our analysis will encompass various publication patterns of chatbot research, including the number of annual publications, their geographic or institutional distribution, and the number of annual grants supporting chatbot research, and further summarize the methodologies used in the development of health-related chatbots, along with their features and applications in health care settings. Software tool VOSViewer (version 1.6.19; Leiden University) will be used to construct and visualize bibliometric networks. Results: The preparation for the bibliometric analysis began on December 3, 2021, when the research team started the process of familiarizing themselves with the software tools that may be used in this analysis, VOSViewer and CiteSpace, during which they consulted 3 librarians at the Yale University regarding search terms and tentative results. Tentative searches on the aforementioned databases yielded a total of 2340 papers. The official search phase started on July 27, 2023. Our goal is to complete the screening of papers and the analysis by February 15, 2024. Conclusions: Artificial intelligence chatbots, such as ChatGPT (OpenAI Inc), have sparked numerous discussions within the health care industry regarding their impact on human health. Chatbot technology holds substantial promise for advancing health care systems worldwide. However, developing a sophisticated chatbot capable of precise interaction with health care consumers, delivering personalized care, and providing accurate health-related information and knowledge remain considerable challenges. This bibliometric analysis seeks to fill the knowledge gap in the existing literature on health-related chatbots, entailing their applications, the software used in their development, and their preferred functionalities among users. International Registered Report Identifier (IRRID): PRR1-10.2196/54349 UR - https://www.researchprotocols.org/2024/1/e54349 UR - http://dx.doi.org/10.2196/54349 UR - http://www.ncbi.nlm.nih.gov/pubmed/38228575 ID - info:doi/10.2196/54349 ER - TY - JOUR AU - Abdullahi, Tassallah AU - Singh, Ritambhara AU - Eickhoff, Carsten PY - 2024/2/13 TI - Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models JO - JMIR Med Educ SP - e51391 VL - 10 KW - clinical decision support KW - rare diseases KW - complex diseases KW - prompt engineering KW - reliability KW - consistency KW - natural language processing KW - language model KW - Bard KW - ChatGPT 3.5 KW - GPT-4 KW - MedAlpaca KW - medical education KW - complex diagnosis KW - artificial intelligence KW - AI assistance KW - medical training KW - prediction model N2 - Background: Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. Objective: This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. Methods: We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. Results: Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. Conclusions: Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model?s characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes. UR - https://mededu.jmir.org/2024/1/e51391 UR - http://dx.doi.org/10.2196/51391 UR - http://www.ncbi.nlm.nih.gov/pubmed/38349725 ID - info:doi/10.2196/51391 ER - TY - JOUR AU - Ma, Yuanchao AU - Achiche, Sofiane AU - Pomey, Marie-Pascale AU - Paquette, Jesseca AU - Adjtoutah, Nesrine AU - Vicente, Serge AU - Engler, Kim AU - AU - Laymouna, Moustafa AU - Lessard, David AU - Lemire, Benoît AU - Asselah, Jamil AU - Therrien, Rachel AU - Osmanlliu, Esli AU - Zawati, H. Ma'n AU - Joly, Yann AU - Lebouché, Bertrand PY - 2024/2/13 TI - Adapting and Evaluating an AI-Based Chatbot Through Patient and Stakeholder Engagement to Provide Information for Different Health Conditions: Master Protocol for an Adaptive Platform Trial (the MARVIN Chatbots Study) JO - JMIR Res Protoc SP - e54668 VL - 13 KW - chatbot KW - master protocol KW - adaptive platform trial design KW - implementation science KW - telehealth KW - digital health KW - Canada KW - artificial intelligence KW - conversational agent KW - self-management KW - research ethics KW - patient and stakeholder engagement KW - co-construction KW - mobile phone N2 - Background: Artificial intelligence (AI)?based chatbots could help address some of the challenges patients face in acquiring information essential to their self-health management, including unreliable sources and overburdened health care professionals. Research to ensure the proper design, implementation, and uptake of chatbots is imperative. Inclusive digital health research and responsible AI integration into health care require active and sustained patient and stakeholder engagement, yet corresponding activities and guidance are limited for this purpose. Objective: In response, this manuscript presents a master protocol for the development, testing, and implementation of a chatbot family in partnership with stakeholders. This protocol aims to help efficiently translate an initial chatbot intervention (MARVIN) to multiple health domains and populations. Methods: The MARVIN chatbots study has an adaptive platform trial design consisting of multiple parallel individual chatbot substudies with four common objectives: (1) co-construct a tailored AI chatbot for a specific health care setting, (2) assess its usability with a small sample of participants, (3) measure implementation outcomes (usability, acceptability, appropriateness, adoption, and fidelity) within a large sample, and (4) evaluate the impact of patient and stakeholder partnerships on chatbot development. For objective 1, a needs assessment will be conducted within the setting, involving four 2-hour focus groups with 5 participants each. Then, a co-construction design committee will be formed with patient partners, health care professionals, and researchers who will participate in 6 workshops for chatbot development, testing, and improvement. For objective 2, a total of 30 participants will interact with the prototype for 3 weeks and assess its usability through a survey and 3 focus groups. Positive usability outcomes will lead to the initiation of objective 3, whereby the public will be able to access the chatbot for a 12-month real-world implementation study using web-based questionnaires to measure usability, acceptability, and appropriateness for 150 participants and meta-use data to inform adoption and fidelity. After each objective, for objective 4, focus groups will be conducted with the design committee to better understand their perspectives on the engagement process. Results: From July 2022 to October 2023, this master protocol led to four substudies conducted at the McGill University Health Centre or the Centre hospitalier de l?Université de Montréal (both in Montreal, Quebec, Canada): (1) MARVIN for HIV (large-scale implementation expected in mid-2024), (2) MARVIN-Pharma for community pharmacists providing HIV care (usability study planned for mid-2024), (3) MARVINA for breast cancer, and (4) MARVIN-CHAMP for pediatric infectious conditions (both in preparation, with development to begin in early 2024). Conclusions: This master protocol offers an approach to chatbot development in partnership with patients and health care professionals that includes a comprehensive assessment of implementation outcomes. It also contributes to best practice recommendations for patient and stakeholder engagement in digital health research. Trial Registration: ClinicalTrials.gov NCT05789901; https://classic.clinicaltrials.gov/ct2/show/NCT05789901 International Registered Report Identifier (IRRID): PRR1-10.2196/54668 UR - https://www.researchprotocols.org/2024/1/e54668 UR - http://dx.doi.org/10.2196/54668 UR - http://www.ncbi.nlm.nih.gov/pubmed/38349734 ID - info:doi/10.2196/54668 ER - TY - JOUR AU - Giunti, Guido AU - Doherty, P. Colin PY - 2024/2/12 TI - Cocreating an Automated mHealth Apps Systematic Review Process With Generative AI: Design Science Research Approach JO - JMIR Med Educ SP - e48949 VL - 10 KW - generative artificial intelligence KW - mHealth KW - ChatGPT KW - evidence-base KW - apps KW - qualitative study KW - design science research KW - eHealth KW - mobile device KW - AI KW - language model KW - mHealth intervention KW - generative AI KW - AI tool KW - software code KW - systematic review N2 - Background: The use of mobile devices for delivering health-related services (mobile health [mHealth]) has rapidly increased, leading to a demand for summarizing the state of the art and practice through systematic reviews. However, the systematic review process is a resource-intensive and time-consuming process. Generative artificial intelligence (AI) has emerged as a potential solution to automate tedious tasks. Objective: This study aimed to explore the feasibility of using generative AI tools to automate time-consuming and resource-intensive tasks in a systematic review process and assess the scope and limitations of using such tools. Methods: We used the design science research methodology. The solution proposed is to use cocreation with a generative AI, such as ChatGPT, to produce software code that automates the process of conducting systematic reviews. Results: A triggering prompt was generated, and assistance from the generative AI was used to guide the steps toward developing, executing, and debugging a Python script. Errors in code were solved through conversational exchange with ChatGPT, and a tentative script was created. The code pulled the mHealth solutions from the Google Play Store and searched their descriptions for keywords that hinted toward evidence base. The results were exported to a CSV file, which was compared to the initial outputs of other similar systematic review processes. Conclusions: This study demonstrates the potential of using generative AI to automate the time-consuming process of conducting systematic reviews of mHealth apps. This approach could be particularly useful for researchers with limited coding skills. However, the study has limitations related to the design science research methodology, subjectivity bias, and the quality of the search results used to train the language model. UR - https://mededu.jmir.org/2024/1/e48949 UR - http://dx.doi.org/10.2196/48949 UR - http://www.ncbi.nlm.nih.gov/pubmed/38345839 ID - info:doi/10.2196/48949 ER - TY - JOUR AU - Chang, Fangyuan AU - Sheng, Lin AU - Gu, Zhenyu PY - 2024/2/12 TI - Investigating the Integration and the Long-Term Use of Smart Speakers in Older Adults? Daily Practices: Qualitative Study JO - JMIR Mhealth Uhealth SP - e47472 VL - 12 KW - smart speaker KW - private home KW - older adults KW - long-term use KW - daily practices KW - smart speakers N2 - Background: As smart speakers become more popular, there have been an increasing number of studies on how they may benefit older adults or how older adults perceive them. Despite the increasing ownership rates of smart speakers among older adults, studies that examine their integration and the long-term use in older adults? daily practices are scarce. Objective: This study aims to uncover the integration of smart speakers into the daily practices of older adults over the long term, contributing to an in-depth understanding of maintained technology use among this demographic. Methods: To achieve these objectives, the study interviewed 20 older adults who had been using smart speakers for over 6 months. These semistructured interviews enabled participants to share their insights and experiences regarding the maintained use of smart speakers in the long term. Results: We identified 4 dimensions of the long-term use of smart speakers among older adults, including functional integration, spatial integration, cognitive integration, and semantic integration. For the functional integration of smart speakers, the study reported different types of use, including entertainment, information collection, medication reminders, companionship, environment modification, and emergency calls. For the spatial integration of smart speakers, the study showed older adults? agency in defining, changing, and reshaping daily practices through the spatial organization of smart speakers. For the cognitive integration of smart speakers, the findings showed the cognitive processes involved in adapting to and incorporating smart speakers into daily habits and routines. For the semantic integration of smart speakers, the findings revealed that older adults? enjoyable user experience and strong bonds with the device contributed to their acceptance of occasional functional errors. Finally, the study proposed several suggestions for designers and developers to better design smart speakers that promote maintainable use behaviors among older adults. Conclusions: On the basis of the findings, this study highlighted the importance of understanding how older adults use smart speakers and the practices through which they integrate them into their daily routines. The findings suggest that smart speakers can provide significant benefits for older adults, including increased convenience and improved quality of life. However, to promote maintainable use behaviors, designers and developers should consider more about the technology use contexts and the specific needs and preferences of older adults when designing these devices. UR - https://mhealth.jmir.org/2024/1/e47472 UR - http://dx.doi.org/10.2196/47472 UR - http://www.ncbi.nlm.nih.gov/pubmed/38345844 ID - info:doi/10.2196/47472 ER - TY - JOUR AU - Yu, Peng AU - Fang, Changchang AU - Liu, Xiaolin AU - Fu, Wanying AU - Ling, Jitao AU - Yan, Zhiwei AU - Jiang, Yuan AU - Cao, Zhengyu AU - Wu, Maoxiong AU - Chen, Zhiteng AU - Zhu, Wengen AU - Zhang, Yuling AU - Abudukeremu, Ayiguli AU - Wang, Yue AU - Liu, Xiao AU - Wang, Jingfeng PY - 2024/2/9 TI - Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study JO - JMIR Med Educ SP - e48514 VL - 10 KW - ChatGPT KW - Chinese Postgraduate Examination for Clinical Medicine KW - medical student KW - performance KW - artificial intelligence KW - medical care KW - qualitative feedback KW - medical education KW - clinical decision-making N2 - Background: ChatGPT, an artificial intelligence (AI) based on large-scale language models, has sparked interest in the field of health care. Nonetheless, the capabilities of AI in text comprehension and generation are constrained by the quality and volume of available training data for a specific language, and the performance of AI across different languages requires further investigation. While AI harbors substantial potential in medicine, it is imperative to tackle challenges such as the formulation of clinical care standards; facilitating cultural transitions in medical education and practice; and managing ethical issues including data privacy, consent, and bias. Objective: The study aimed to evaluate ChatGPT?s performance in processing Chinese Postgraduate Examination for Clinical Medicine questions, assess its clinical reasoning ability, investigate potential limitations with the Chinese language, and explore its potential as a valuable tool for medical professionals in the Chinese context. Methods: A data set of Chinese Postgraduate Examination for Clinical Medicine questions was used to assess the effectiveness of ChatGPT?s (version 3.5) medical knowledge in the Chinese language, which has a data set of 165 medical questions that were divided into three categories: (1) common questions (n=90) assessing basic medical knowledge, (2) case analysis questions (n=45) focusing on clinical decision-making through patient case evaluations, and (3) multichoice questions (n=30) requiring the selection of multiple correct answers. First of all, we assessed whether ChatGPT could meet the stringent cutoff score defined by the government agency, which requires a performance within the top 20% of candidates. Additionally, in our evaluation of ChatGPT?s performance on both original and encoded medical questions, 3 primary indicators were used: accuracy, concordance (which validates the answer), and the frequency of insights. Results: Our evaluation revealed that ChatGPT scored 153.5 out of 300 for original questions in Chinese, which signifies the minimum score set to ensure that at least 20% more candidates pass than the enrollment quota. However, ChatGPT had low accuracy in answering open-ended medical questions, with only 31.5% total accuracy. The accuracy for common questions, multichoice questions, and case analysis questions was 42%, 37%, and 17%, respectively. ChatGPT achieved a 90% concordance across all questions. Among correct responses, the concordance was 100%, significantly exceeding that of incorrect responses (n=57, 50%; P<.001). ChatGPT provided innovative insights for 80% (n=132) of all questions, with an average of 2.95 insights per accurate response. Conclusions: Although ChatGPT surpassed the passing threshold for the Chinese Postgraduate Examination for Clinical Medicine, its performance in answering open-ended medical questions was suboptimal. Nonetheless, ChatGPT exhibited high internal concordance and the ability to generate multiple insights in the Chinese language. Future research should investigate the language-based discrepancies in ChatGPT?s performance within the health care context. UR - https://mededu.jmir.org/2024/1/e48514 UR - http://dx.doi.org/10.2196/48514 UR - http://www.ncbi.nlm.nih.gov/pubmed/38335017 ID - info:doi/10.2196/48514 ER - TY - JOUR AU - Meyer, Annika AU - Riese, Janik AU - Streichert, Thomas PY - 2024/2/8 TI - Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study JO - JMIR Med Educ SP - e50965 VL - 10 KW - ChatGPT KW - artificial intelligence KW - large language model KW - medical exams KW - medical examinations KW - medical education KW - LLM KW - public trust KW - trust KW - medical accuracy KW - licensing exam KW - licensing examination KW - improvement KW - patient care KW - general population KW - licensure examination N2 - Background: The potential of artificial intelligence (AI)?based large language models, such as ChatGPT, has gained significant attention in the medical field. This enthusiasm is driven not only by recent breakthroughs and improved accessibility, but also by the prospect of democratizing medical knowledge and promoting equitable health care. However, the performance of ChatGPT is substantially influenced by the input language, and given the growing public trust in this AI tool compared to that in traditional sources of information, investigating its medical accuracy across different languages is of particular importance. Objective: This study aimed to compare the performance of GPT-3.5 and GPT-4 with that of medical students on the written German medical licensing examination. Methods: To assess GPT-3.5?s and GPT-4's medical proficiency, we used 937 original multiple-choice questions from 3 written German medical licensing examinations in October 2021, April 2022, and October 2022. Results: GPT-4 achieved an average score of 85% and ranked in the 92.8th, 99.5th, and 92.6th percentiles among medical students who took the same examinations in October 2021, April 2022, and October 2022, respectively. This represents a substantial improvement of 27% compared to GPT-3.5, which only passed 1 out of the 3 examinations. While GPT-3.5 performed well in psychiatry questions, GPT-4 exhibited strengths in internal medicine and surgery but showed weakness in academic research. Conclusions: The study results highlight ChatGPT?s remarkable improvement from moderate (GPT-3.5) to high competency (GPT-4) in answering medical licensing examination questions in German. While GPT-4?s predecessor (GPT-3.5) was imprecise and inconsistent, it demonstrates considerable potential to improve medical education and patient care, provided that medically trained users critically evaluate its results. As the replacement of search engines by AI tools seems possible in the future, further studies with nonprofessional questions are needed to assess the safety and accuracy of ChatGPT for the general population. UR - https://mededu.jmir.org/2024/1/e50965 UR - http://dx.doi.org/10.2196/50965 UR - http://www.ncbi.nlm.nih.gov/pubmed/38329802 ID - info:doi/10.2196/50965 ER - TY - JOUR AU - Elyoseph, Zohar AU - Refoua, Elad AU - Asraf, Kfir AU - Lvovsky, Maya AU - Shimoni, Yoav AU - Hadar-Shoval, Dorit PY - 2024/2/6 TI - Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study JO - JMIR Ment Health SP - e54369 VL - 11 KW - Reading the Mind in the Eyes Test KW - RMET KW - emotional awareness KW - emotional comprehension KW - emotional cue KW - emotional cues KW - ChatGPT KW - large language model KW - LLM KW - large language models KW - LLMs KW - empathy KW - mentalizing KW - mentalization KW - machine learning KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - early warning KW - early detection KW - mental health KW - mental disease KW - mental illness KW - mental illnesses KW - mental diseases N2 - Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one?s own and others? mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard?s existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models? proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models? aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard?s performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard?s capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy. UR - https://mental.jmir.org/2024/1/e54369 UR - http://dx.doi.org/10.2196/54369 UR - http://www.ncbi.nlm.nih.gov/pubmed/38319707 ID - info:doi/10.2196/54369 ER - TY - JOUR AU - Gray, Megan AU - Baird, Austin AU - Sawyer, Taylor AU - James, Jasmine AU - DeBroux, Thea AU - Bartlett, Michelle AU - Krick, Jeanne AU - Umoren, Rachel PY - 2024/2/1 TI - Increasing Realism and Variety of Virtual Patient Dialogues for Prenatal Counseling Education Through a Novel Application of ChatGPT: Exploratory Observational Study JO - JMIR Med Educ SP - e50705 VL - 10 KW - prenatal counseling KW - virtual health KW - virtual patient KW - simulation KW - neonatology KW - ChatGPT KW - AI KW - artificial intelligence N2 - Background: Using virtual patients, facilitated by natural language processing, provides a valuable educational experience for learners. Generating a large, varied sample of realistic and appropriate responses for virtual patients is challenging. Artificial intelligence (AI) programs can be a viable source for these responses, but their utility for this purpose has not been explored. Objective: In this study, we explored the effectiveness of generative AI (ChatGPT) in developing realistic virtual standardized patient dialogues to teach prenatal counseling skills. Methods: ChatGPT was prompted to generate a list of common areas of concern and questions that families expecting preterm delivery at 24 weeks gestation might ask during prenatal counseling. ChatGPT was then prompted to generate 2 role-plays with dialogues between a parent expecting a potential preterm delivery at 24 weeks and their counseling physician using each of the example questions. The prompt was repeated for 2 unique role-plays: one parent was characterized as anxious and the other as having low trust in the medical system. Role-play scripts were exported verbatim and independently reviewed by 2 neonatologists with experience in prenatal counseling, using a scale of 1-5 on realism, appropriateness, and utility for virtual standardized patient responses. Results: ChatGPT generated 7 areas of concern, with 35 example questions used to generate role-plays. The 35 role-play transcripts generated 176 unique parent responses (median 5, IQR 4-6, per role-play) with 268 unique sentences. Expert review identified 117 (65%) of the 176 responses as indicating an emotion, either directly or indirectly. Approximately half (98/176, 56%) of the responses had 2 or more sentences, and half (88/176, 50%) included at least 1 question. More than half (104/176, 58%) of the responses from role-played parent characters described a feeling, such as being scared, worried, or concerned. The role-plays of parents with low trust in the medical system generated many unique sentences (n=50). Most of the sentences in the responses were found to be reasonably realistic (214/268, 80%), appropriate for variable prenatal counseling conversation paths (233/268, 87%), and usable without more than a minimal modification in a virtual patient program (169/268, 63%). Conclusions: Generative AI programs, such as ChatGPT, may provide a viable source of training materials to expand virtual patient programs, with careful attention to the concerns and questions of patients and families. Given the potential for unrealistic or inappropriate statements and questions, an expert should review AI chat outputs before deploying them in an educational program. UR - https://mededu.jmir.org/2024/1/e50705 UR - http://dx.doi.org/10.2196/50705 UR - http://www.ncbi.nlm.nih.gov/pubmed/38300696 ID - info:doi/10.2196/50705 ER - TY - JOUR AU - Kavadella, Argyro AU - Dias da Silva, Antonio Marco AU - Kaklamanos, G. Eleftherios AU - Stamatopoulos, Vasileios AU - Giannakopoulos, Kostis PY - 2024/1/31 TI - Evaluation of ChatGPT?s Real-Life Implementation in Undergraduate Dental Education: Mixed Methods Study JO - JMIR Med Educ SP - e51344 VL - 10 KW - ChatGPT KW - large language models KW - LLM KW - natural language processing KW - artificial Intelligence KW - dental education KW - higher education KW - learning assignments KW - dental students KW - AI pedagogy KW - dentistry KW - university N2 - Background: The recent artificial intelligence tool ChatGPT seems to offer a range of benefits in academic education while also raising concerns. Relevant literature encompasses issues of plagiarism and academic dishonesty, as well as pedagogy and educational affordances; yet, no real-life implementation of ChatGPT in the educational process has been reported to our knowledge so far. Objective: This mixed methods study aimed to evaluate the implementation of ChatGPT in the educational process, both quantitatively and qualitatively. Methods: In March 2023, a total of 77 second-year dental students of the European University Cyprus were divided into 2 groups and asked to compose a learning assignment on ?Radiation Biology and Radiation Protection in the Dental Office,? working collaboratively in small subgroups, as part of the educational semester program of the Dentomaxillofacial Radiology module. Careful planning ensured a seamless integration of ChatGPT, addressing potential challenges. One group searched the internet for scientific resources to perform the task and the other group used ChatGPT for this purpose. Both groups developed a PowerPoint (Microsoft Corp) presentation based on their research and presented it in class. The ChatGPT group students additionally registered all interactions with the language model during the prompting process and evaluated the final outcome; they also answered an open-ended evaluation questionnaire, including questions on their learning experience. Finally, all students undertook a knowledge examination on the topic, and the grades between the 2 groups were compared statistically, whereas the free-text comments of the questionnaires were thematically analyzed. Results: Out of the 77 students, 39 were assigned to the ChatGPT group and 38 to the literature research group. Seventy students undertook the multiple choice question knowledge examination, and examination grades ranged from 5 to 10 on the 0-10 grading scale. The Mann-Whitney U test showed that students of the ChatGPT group performed significantly better (P=.045) than students of the literature research group. The evaluation questionnaires revealed the benefits (human-like interface, immediate response, and wide knowledge base), the limitations (need for rephrasing the prompts to get a relevant answer, general content, false citations, and incapability to provide images or videos), and the prospects (in education, clinical practice, continuing education, and research) of ChatGPT. Conclusions: Students using ChatGPT for their learning assignments performed significantly better in the knowledge examination than their fellow students who used the literature research methodology. Students adapted quickly to the technological environment of the language model, recognized its opportunities and limitations, and used it creatively and efficiently. Implications for practice: the study underscores the adaptability of students to technological innovations including ChatGPT and its potential to enhance educational outcomes. Educators should consider integrating ChatGPT into curriculum design; awareness programs are warranted to educate both students and educators about the limitations of ChatGPT, encouraging critical engagement and responsible use. UR - https://mededu.jmir.org/2024/1/e51344 UR - http://dx.doi.org/10.2196/51344 UR - http://www.ncbi.nlm.nih.gov/pubmed/38111256 ID - info:doi/10.2196/51344 ER - TY - JOUR AU - Moore, Richard AU - Al-Tamimi, Abdel-Karim AU - Freeman, Elizabeth PY - 2024/1/31 TI - Investigating the Potential of a Conversational Agent (Phyllis) to Support Adolescent Health and Overcome Barriers to Physical Activity: Co-Design Study JO - JMIR Form Res SP - e51571 VL - 8 KW - physical activity KW - inactivity KW - conversational agent KW - CA KW - adolescent KW - public health KW - digital health interventions KW - mobile phone N2 - Background: Conversational agents (CAs) are a promising solution to support people in improving physical activity (PA) behaviors. However, there is a lack of CAs targeted at adolescents that aim to provide support to overcome barriers to PA. This study reports the results of the co-design, development, and evaluation of a prototype CA called ?Phyllis? to support adolescents in overcoming barriers to PA with the aim of improving PA behaviors. The study presents one of the first theory-driven CAs that use existing research, a theoretical framework, and a behavior change model. Objective: The aim of the study is to use a mixed methods approach to investigate the potential of a CA to support adolescents in overcoming barriers to PA and enhance their confidence and motivation to engage in PA. Methods: The methodology involved co-designing with 8 adolescents to create a relational and persuasive CA with a suitable persona and dialogue. The CA was evaluated to determine its acceptability, usability, and effectiveness, with 46 adolescents participating in the study via a web-based survey. Results: The co-design participants were students aged 11 to 13 years, with a sex distribution of 56% (5/9) female and 44% (4/9) male, representing diverse ethnic backgrounds. Participants reported 37 specific barriers to PA, and the most common barriers included a ?lack of confidence,? ?fear of failure,? and a ?lack of motivation.? The CA?s persona, named ?Phyllis,? was co-designed with input from the students, reflecting their preferences for a friendly, understanding, and intelligent personality. Users engaged in 61 conversations with Phyllis and reported a positive user experience, and 73% of them expressed a definite intention to use the fully functional CA in the future, with a net promoter score indicating a high likelihood of recommendation. Phyllis also performed well, being able to recognize a range of different barriers to PA. The CA?s persuasive capacity was evaluated in modules focusing on confidence and motivation, with a significant increase in students? agreement in feeling confident and motivated to engage in PA after interacting with Phyllis. Adolescents also expect to have a personalized experience and be able to personalize all aspects of the CA. Conclusions: The results showed high acceptability and a positive user experience, indicating the CA?s potential. Promising outcomes were observed, with increasing confidence and motivation for PA. Further research and development are needed to create further interventions to address other barriers to PA and assess long-term behavior change. Addressing concerns regarding bias and privacy is crucial for achieving acceptability in the future. The CA?s potential extends to health care systems and multimodal support, providing valuable insights for designing digital health interventions including tackling global inactivity issues among adolescents. UR - https://formative.jmir.org/2024/1/e51571 UR - http://dx.doi.org/10.2196/51571 UR - http://www.ncbi.nlm.nih.gov/pubmed/38294857 ID - info:doi/10.2196/51571 ER - TY - JOUR AU - Fu, Ziru AU - Hsu, Cheng Yu AU - Chan, S. Christian AU - Lau, Ming Chaak AU - Liu, Joyce AU - Yip, Fai Paul Siu PY - 2024/1/30 TI - Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study JO - J Med Internet Res SP - e51069 VL - 26 KW - Cantonese KW - ChatGPT KW - counseling KW - natural language processing KW - NLP KW - sentiment analysis N2 - Background: Sentiment analysis is a significant yet difficult task in natural language processing. The linguistic peculiarities of Cantonese, including its high similarity with Standard Chinese, its grammatical and lexical uniqueness, and its colloquialism and multilingualism, make it different from other languages and pose additional challenges to sentiment analysis. Recent advances in models such as ChatGPT offer potential viable solutions. Objective: This study investigated the efficacy of GPT-3.5 and GPT-4 in Cantonese sentiment analysis in the context of web-based counseling and compared their performance with other mainstream methods, including lexicon-based methods and machine learning approaches. Methods: We analyzed transcripts from a web-based, text-based counseling service in Hong Kong, including a total of 131 individual counseling sessions and 6169 messages between counselors and help-seekers. First, a codebook was developed for human annotation. A simple prompt (?Is the sentiment of this Cantonese text positive, neutral, or negative? Respond with the sentiment label only.?) was then given to GPT-3.5 and GPT-4 to label each message?s sentiment. GPT-3.5 and GPT-4?s performance was compared with a lexicon-based method and 3 state-of-the-art models, including linear regression, support vector machines, and long short-term memory neural networks. Results: Our findings revealed ChatGPT?s remarkable accuracy in sentiment classification, with GPT-3.5 and GPT-4, respectively, achieving 92.1% (5682/6169) and 95.3% (5880/6169) accuracy in identifying positive, neutral, and negative sentiment, thereby outperforming the traditional lexicon-based method, which had an accuracy of 37.2% (2295/6169), and the 3 machine learning models, which had accuracies ranging from 66% (4072/6169) to 70.9% (4374/6169). Conclusions: Among many text analysis techniques, ChatGPT demonstrates superior accuracy and emerges as a promising tool for Cantonese sentiment analysis. This study also highlights ChatGPT?s applicability in real-world scenarios, such as monitoring the quality of text-based counseling services and detecting message-level sentiments in vivo. The insights derived from this study pave the way for further exploration into the capabilities of ChatGPT in the context of underresourced languages and specialized domains like psychotherapy and natural language processing. UR - https://www.jmir.org/2024/1/e51069 UR - http://dx.doi.org/10.2196/51069 UR - http://www.ncbi.nlm.nih.gov/pubmed/38289662 ID - info:doi/10.2196/51069 ER - TY - JOUR AU - Cheah, Hui Min AU - Gan, Nee Yan AU - Altice, L. Frederick AU - Wickersham, A. Jeffrey AU - Shrestha, Roman AU - Salleh, Mohd Nur Afiqah AU - Ng, Seong Kee AU - Azwa, Iskandar AU - Balakrishnan, Vimala AU - Kamarulzaman, Adeeba AU - Ni, Zhao PY - 2024/1/26 TI - Testing the Feasibility and Acceptability of Using an Artificial Intelligence Chatbot to Promote HIV Testing and Pre-Exposure Prophylaxis in Malaysia: Mixed Methods Study JO - JMIR Hum Factors SP - e52055 VL - 11 KW - artificial intelligence KW - acceptability KW - chatbot KW - feasibility KW - HIV prevention KW - HIV testing KW - men who have sex with men KW - MSM KW - mobile health KW - mHealth KW - preexposure prophylaxis KW - PrEP KW - mobile phone N2 - Background: The HIV epidemic continues to grow fastest among men who have sex with men (MSM) in Malaysia in the presence of stigma and discrimination. Engaging MSM on the internet using chatbots supported through artificial intelligence (AI) can potentially help HIV prevention efforts. We previously identified the benefits, limitations, and preferred features of HIV prevention AI chatbots and developed an AI chatbot prototype that is now tested for feasibility and acceptability. Objective: This study aims to test the feasibility and acceptability of an AI chatbot in promoting the uptake of HIV testing and pre-exposure prophylaxis (PrEP) in MSM. Methods: We conducted beta testing with 14 MSM from February to April 2022 using Zoom (Zoom Video Communications, Inc). Beta testing involved 3 steps: a 45-minute human-chatbot interaction using the think-aloud method, a 35-minute semistructured interview, and a 10-minute web-based survey. The first 2 steps were recorded, transcribed verbatim, and analyzed using the Unified Theory of Acceptance and Use of Technology. Emerging themes from the qualitative data were mapped on the 4 domains of the Unified Theory of Acceptance and Use of Technology: performance expectancy, effort expectancy, facilitating conditions, and social influence. Results: Most participants (13/14, 93%) perceived the chatbot to be useful because it provided comprehensive information on HIV testing and PrEP (performance expectancy). All participants indicated that the chatbot was easy to use because of its simple, straightforward design and quick, friendly responses (effort expectancy). Moreover, 93% (13/14) of the participants rated the overall chatbot quality as high, and all participants perceived the chatbot as a helpful tool and would refer it to others. Approximately 79% (11/14) of the participants agreed they would continue using the chatbot. They suggested adding a local language (ie, Bahasa Malaysia) to customize the chatbot to the Malaysian context (facilitating condition) and suggested that the chatbot should also incorporate more information on mental health, HIV risk assessment, and consequences of HIV. In terms of social influence, all participants perceived the chatbot as helpful in avoiding stigma-inducing interactions and thus could increase the frequency of HIV testing and PrEP uptake among MSM. Conclusions: The current AI chatbot is feasible and acceptable to promote the uptake of HIV testing and PrEP. To ensure the successful implementation and dissemination of AI chatbots in Malaysia, they should be customized to communicate in Bahasa Malaysia and upgraded to provide other HIV-related information to improve usability, such as mental health support, risk assessment for sexually transmitted infections, AIDS treatment, and the consequences of contracting HIV. UR - https://humanfactors.jmir.org/2024/1/e52055 UR - http://dx.doi.org/10.2196/52055 UR - http://www.ncbi.nlm.nih.gov/pubmed/38277206 ID - info:doi/10.2196/52055 ER - TY - JOUR AU - Lee, You-Qian AU - Chen, Ching-Tai AU - Chen, Chien-Chang AU - Lee, Chung-Hong AU - Chen, Peitsz AU - Wu, Chi-Shin AU - Dai, Hong-Jie PY - 2024/1/25 TI - Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study JO - J Med Internet Res SP - e48443 VL - 26 KW - code mixing KW - electronic health record KW - deidentification KW - pretrained language model KW - large language model KW - ChatGPT N2 - Background: The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. Objective: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. Methods: We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models? outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. Results: The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model?s performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. Conclusions: The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI. UR - https://www.jmir.org/2024/1/e48443 UR - http://dx.doi.org/10.2196/48443 UR - http://www.ncbi.nlm.nih.gov/pubmed/38271060 ID - info:doi/10.2196/48443 ER - TY - JOUR AU - Lossio-Ventura, Antonio Juan AU - Weger, Rachel AU - Lee, Y. Angela AU - Guinee, P. Emily AU - Chung, Joyce AU - Atlas, Lauren AU - Linos, Eleni AU - Pereira, Francisco PY - 2024/1/25 TI - A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data JO - JMIR Ment Health SP - e50150 VL - 11 KW - sentiment analysis KW - COVID-19 survey KW - large language model KW - few-shot learning KW - zero-shot learning KW - ChatGPT KW - COVID-19 N2 - Background: Health care providers and health-related researchers face significant challenges when applying sentiment analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains such as social media, and their performance in the health care context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results. Objective: This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective was to automatically predict sentence sentiment for 2 independent COVID-19 survey data sets from the National Institutes of Health and Stanford University. Methods: Gold standard labels were created for a subset of each data set using a panel of human raters. We compared 8 state-of-the-art sentiment analysis tools on both data sets to evaluate variability and disagreement across tools. In addition, few-shot learning was explored by fine-tuning Open Pre-Trained Transformers (OPT; a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights). Results: The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperforming all other sentiment analysis tools. Moreover, ChatGPT outperformed OPT, exhibited higher accuracy by 6% and higher F-measure by 4% to 7%. Conclusions: This study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in the sentiment analysis of health-related survey data. These results have implications for saving human labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis. UR - https://mental.jmir.org/2024/1/e50150 UR - http://dx.doi.org/10.2196/50150 UR - http://www.ncbi.nlm.nih.gov/pubmed/38271138 ID - info:doi/10.2196/50150 ER - TY - JOUR AU - Ulrich, Sandra AU - Gantenbein, R. Andreas AU - Zuber, Viktor AU - Von Wyl, Agnes AU - Kowatsch, Tobias AU - Künzli, Hansjörg PY - 2024/1/24 TI - Development and Evaluation of a Smartphone-Based Chatbot Coach to Facilitate a Balanced Lifestyle in Individuals With Headaches (BalanceUP App): Randomized Controlled Trial JO - J Med Internet Res SP - e50132 VL - 26 KW - chatbot KW - mobile health KW - mHealth KW - smartphone KW - headache management KW - psychoeducation KW - behavior change KW - stress management KW - mental well-being KW - lifestyle KW - mindfulness KW - relaxation KW - mobile phone N2 - Background: Primary headaches, including migraine and tension-type headaches, are widespread and have a social, physical, mental, and economic impact. Among the key components of treatment are behavior interventions such as lifestyle modification. Scalable conversational agents (CAs) have the potential to deliver behavior interventions at a low threshold. To our knowledge, there is no evidence of behavioral interventions delivered by CAs for the treatment of headaches. Objective: This study has 2 aims. The first aim was to develop and test a smartphone-based coaching intervention (BalanceUP) for people experiencing frequent headaches, delivered by a CA and designed to improve mental well-being using various behavior change techniques. The second aim was to evaluate the effectiveness of BalanceUP by comparing the intervention and waitlist control groups and assess the engagement and acceptance of participants using BalanceUP. Methods: In an unblinded randomized controlled trial, adults with frequent headaches were recruited on the web and in collaboration with experts and allocated to either a CA intervention (BalanceUP) or a control condition. The effects of the treatment on changes in the primary outcome of the study, that is, mental well-being (as measured by the Patient Health Questionnaire Anxiety and Depression Scale), and secondary outcomes (eg, psychosomatic symptoms, stress, headache-related self-efficacy, intention to change behavior, presenteeism and absenteeism, and pain coping) were analyzed using linear mixed models and Cohen d. Primary and secondary outcomes were self-assessed before and after the intervention, and acceptance was assessed after the intervention. Engagement was measured during the intervention using self-reports and usage data. Results: A total of 198 participants (mean age 38.7, SD 12.14 y; n=172, 86.9% women) participated in the study (intervention group: n=110; waitlist control group: n=88). After the intervention, the intention-to-treat analysis revealed evidence for improved well-being (treatment: ? estimate=?3.28, 95% CI ?5.07 to ?1.48) with moderate between-group effects (Cohen d=?0.66, 95% CI ?0.99 to ?0.33) in favor of the intervention group. We also found evidence of reduced somatic symptoms, perceived stress, and absenteeism and presenteeism, as well as improved headache management self-efficacy, application of behavior change techniques, and pain coping skills, with effects ranging from medium to large (Cohen d=0.43-1.05). Overall, 64.8% (118/182) of the participants used coaching as intended by engaging throughout the coaching and completing the outro. Conclusions: BalanceUP was well accepted, and the results suggest that coaching delivered by a CA can be effective in reducing the burden of people who experience headaches by improving their well-being. Trial Registration: German Clinical Trials Register DRKS00017422; https://trialsearch.who.int/Trial2.aspx?TrialID=DRKS00017422 UR - https://www.jmir.org/2024/1/e50132 UR - http://dx.doi.org/10.2196/50132 UR - http://www.ncbi.nlm.nih.gov/pubmed/38265863 ID - info:doi/10.2196/50132 ER - TY - JOUR AU - Herrmann-Werner, Anne AU - Festl-Wietek, Teresa AU - Holderried, Friederike AU - Herschbach, Lea AU - Griewatz, Jan AU - Masters, Ken AU - Zipfel, Stephan AU - Mahling, Moritz PY - 2024/1/23 TI - Assessing ChatGPT?s Mastery of Bloom?s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study JO - J Med Internet Res SP - e52113 VL - 26 KW - answer KW - artificial intelligence KW - assessment KW - Bloom?s taxonomy KW - ChatGPT KW - classification KW - error KW - exam KW - examination KW - generative KW - GPT-4 KW - Generative Pre-trained Transformer 4 KW - language model KW - learning outcome KW - LLM KW - MCQ KW - medical education KW - medical exam KW - multiple-choice question KW - natural language processing KW - NLP KW - psychosomatic KW - question KW - response KW - taxonomy N2 - Background: Large language models such as GPT-4 (Generative Pre-trained Transformer 4) are being increasingly used in medicine and medical education. However, these models are prone to ?hallucinations? (ie, outputs that seem convincing while being factually incorrect). It is currently unknown how these errors by large language models relate to the different cognitive levels defined in Bloom?s taxonomy. Objective: This study aims to explore how GPT-4 performs in terms of Bloom?s taxonomy using psychosomatic medicine exam questions. Methods: We used a large data set of psychosomatic medicine multiple-choice questions (N=307) with real-world results derived from medical school exams. GPT-4 answered the multiple-choice questions using 2 distinct prompt versions: detailed and short. The answers were analyzed using a quantitative approach and a qualitative approach. Focusing on incorrectly answered questions, we categorized reasoning errors according to the hierarchical framework of Bloom?s taxonomy. Results: GPT-4?s performance in answering exam questions yielded a high success rate: 93% (284/307) for the detailed prompt and 91% (278/307) for the short prompt. Questions answered correctly by GPT-4 had a statistically significant higher difficulty than questions answered incorrectly (P=.002 for the detailed prompt and P<.001 for the short prompt). Independent of the prompt, GPT-4?s lowest exam performance was 78.9% (15/19), thereby always surpassing the ?pass? threshold. Our qualitative analysis of incorrect answers, based on Bloom?s taxonomy, showed that errors were primarily in the ?remember? (29/68) and ?understand? (23/68) cognitive levels; specific issues arose in recalling details, understanding conceptual relationships, and adhering to standardized guidelines. Conclusions: GPT-4 demonstrated a remarkable success rate when confronted with psychosomatic medicine multiple-choice exam questions, aligning with previous findings. When evaluated through Bloom?s taxonomy, our data revealed that GPT-4 occasionally ignored specific facts (remember), provided illogical reasoning (understand), or failed to apply concepts to a new situation (apply). These errors, which were confidently presented, could be attributed to inherent model biases and the tendency to generate outputs that maximize likelihood. UR - https://www.jmir.org/2024/1/e52113 UR - http://dx.doi.org/10.2196/52113 UR - http://www.ncbi.nlm.nih.gov/pubmed/38261378 ID - info:doi/10.2196/52113 ER - TY - JOUR AU - Wang, Xin AU - Li, Juan AU - Liang, Tianyi AU - Hasan, Ul Wordh AU - Zaman, Tuz Kimia AU - Du, Yang AU - Xie, Bo AU - Tao, Cui PY - 2024/1/23 TI - Promoting Personalized Reminiscence Among Cognitively Intact Older Adults Through an AI-Driven Interactive Multimodal Photo Album: Development and Usability Study JO - JMIR Aging SP - e49415 VL - 7 KW - aging KW - knowledge graph KW - machine learning KW - reminiscence KW - voice assistant N2 - Background: Reminiscence, a therapy that uses stimulating materials such as old photos and videos to stimulate long-term memory, can improve the emotional well-being and life satisfaction of older adults, including those who are cognitively intact. However, providing personalized reminiscence therapy can be challenging for caregivers and family members. Objective: This study aimed to achieve three objectives: (1) design and develop the GoodTimes app, an interactive multimodal photo album that uses artificial intelligence (AI) to engage users in personalized conversations and storytelling about their pictures, encompassing family, friends, and special moments; (2) examine the app?s functionalities in various scenarios using use-case studies and assess the app?s usability and user experience through the user study; and (3) investigate the app?s potential as a supplementary tool for reminiscence therapy among cognitively intact older adults, aiming to enhance their psychological well-being by facilitating the recollection of past experiences. Methods: We used state-of-the-art AI technologies, including image recognition, natural language processing, knowledge graph, logic, and machine learning, to develop GoodTimes. First, we constructed a comprehensive knowledge graph that models the information required for effective communication, including photos, people, locations, time, and stories related to the photos. Next, we developed a voice assistant that interacts with users by leveraging the knowledge graph and machine learning techniques. Then, we created various use cases to examine the functions of the system in different scenarios. Finally, to evaluate GoodTimes? usability, we conducted a study with older adults (N=13; age range 58-84, mean 65.8 years). The study period started from January to March 2023. Results: The use-case tests demonstrated the performance of GoodTimes in handling a variety of scenarios, highlighting its versatility and adaptability. For the user study, the feedback from our participants was highly positive, with 92% (12/13) reporting a positive experience conversing with GoodTimes. All participants mentioned that the app invoked pleasant memories and aided in recollecting loved ones, resulting in a sense of happiness for the majority (11/13, 85%). Additionally, a significant majority found GoodTimes to be helpful (11/13, 85%) and user-friendly (12/13, 92%). Most participants (9/13, 69%) expressed a desire to use the app frequently, although some (4/13, 31%) indicated a need for technical support to navigate the system effectively. Conclusions: Our AI-based interactive photo album, GoodTimes, was able to engage users in browsing their photos and conversing about them. Preliminary evidence supports GoodTimes? usability and benefits cognitively intact older adults. Future work is needed to explore its potential positive effects among older adults with cognitive impairment. UR - https://aging.jmir.org/2024/1/e49415 UR - http://dx.doi.org/10.2196/49415 UR - http://www.ncbi.nlm.nih.gov/pubmed/38261365 ID - info:doi/10.2196/49415 ER - TY - JOUR AU - Liu, Xiaocong AU - Wu, Jiageng AU - Shao, An AU - Shen, Wenyue AU - Ye, Panpan AU - Wang, Yao AU - Ye, Juan AU - Jin, Kai AU - Yang, Jie PY - 2024/1/22 TI - Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study JO - J Med Internet Res SP - e51926 VL - 26 KW - large language models KW - ChatGPT KW - clinical decision support KW - retinal vascular disease KW - artificial intelligence N2 - Background: Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in depth. Objective: This study aimed to evaluate ChatGPT?s diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment. Methods: In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnoses written in Chinese and tested ChatGPT with 4 prompting strategies (direct diagnosis or diagnosis with a step-by-step reasoning process and in Chinese or English). Results: Compared with ChatGPT using Chinese prompts for direct diagnosis that achieved an F1-score of 70.47%, ChatGPT using English prompts for direct diagnosis achieved the best diagnostic performance (80.05%), which was inferior to ophthalmologists (89.35%) but close to ophthalmologist interns (82.69%). As for its inference abilities, although ChatGPT can derive a reasoning process with a low error rate (0.4 per report) for both Chinese and English prompts, ophthalmologists identified that the latter brought more reasoning steps with less incompleteness (44.31%), misinformation (1.96%), and hallucinations (0.59%) (all P<.001). Also, analysis of the robustness of ChatGPT with different language prompts indicated significant differences in the recall (P=.03) and F1-score (P=.04) between Chinese and English prompts. In short, when prompted in English, ChatGPT exhibited enhanced diagnostic and inference capabilities for retinal vascular disease classification based on Chinese fundus fluorescein angiography reports. Conclusions: ChatGPT can serve as a helpful medical assistant to provide diagnosis in non-English clinical environments, but there are still performance gaps, language disparities, and errors compared to professionals, which demonstrate the potential limitations and the need to continually explore more robust large language models in ophthalmology practice. UR - https://www.jmir.org/2024/1/e51926 UR - http://dx.doi.org/10.2196/51926 UR - http://www.ncbi.nlm.nih.gov/pubmed/38252483 ID - info:doi/10.2196/51926 ER - TY - JOUR AU - Sezgin, Emre PY - 2024/1/19 TI - Redefining Virtual Assistants in Health Care: The Future With Large Language Models JO - J Med Internet Res SP - e53225 VL - 26 KW - large language models KW - voice assistants KW - virtual assistants KW - chatbots KW - conversational agents KW - health care UR - https://www.jmir.org/2024/1/e53225 UR - http://dx.doi.org/10.2196/53225 UR - http://www.ncbi.nlm.nih.gov/pubmed/38241074 ID - info:doi/10.2196/53225 ER - TY - JOUR AU - Haddad, Firas AU - Saade, S. Joanna PY - 2024/1/18 TI - Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study JO - JMIR Med Educ SP - e50842 VL - 10 KW - ChatGPT KW - artificial intelligence KW - AI KW - board examinations KW - ophthalmology KW - testing N2 - Background: ChatGPT and language learning models have gained attention recently for their ability to answer questions on various examinations across various disciplines. The question of whether ChatGPT could be used to aid in medical education is yet to be answered, particularly in the field of ophthalmology. Objective: The aim of this study is to assess the ability of ChatGPT-3.5 (GPT-3.5) and ChatGPT-4.0 (GPT-4.0) to answer ophthalmology-related questions across different levels of ophthalmology training. Methods: Questions from the United States Medical Licensing Examination (USMLE) steps 1 (n=44), 2 (n=60), and 3 (n=28) were extracted from AMBOSS, and 248 questions (64 easy, 122 medium, and 62 difficult questions) were extracted from the book, Ophthalmology Board Review Q&A, for the Ophthalmic Knowledge Assessment Program and the Board of Ophthalmology (OB) Written Qualifying Examination (WQE). Questions were prompted identically and inputted to GPT-3.5 and GPT-4.0. Results: GPT-3.5 achieved a total of 55% (n=210) of correct answers, while GPT-4.0 achieved a total of 70% (n=270) of correct answers. GPT-3.5 answered 75% (n=33) of questions correctly in USMLE step 1, 73.33% (n=44) in USMLE step 2, 60.71% (n=17) in USMLE step 3, and 46.77% (n=116) in the OB-WQE. GPT-4.0 answered 70.45% (n=31) of questions correctly in USMLE step 1, 90.32% (n=56) in USMLE step 2, 96.43% (n=27) in USMLE step 3, and 62.90% (n=156) in the OB-WQE. GPT-3.5 performed poorer as examination levels advanced (P<.001), while GPT-4.0 performed better on USMLE steps 2 and 3 and worse on USMLE step 1 and the OB-WQE (P<.001). The coefficient of correlation (r) between ChatGPT answering correctly and human users answering correctly was 0.21 (P=.01) for GPT-3.5 as compared to ?0.31 (P<.001) for GPT-4.0. GPT-3.5 performed similarly across difficulty levels, while GPT-4.0 performed more poorly with an increase in the difficulty level. Both GPT models performed significantly better on certain topics than on others. Conclusions: ChatGPT is far from being considered a part of mainstream medical education. Future models with higher accuracy are needed for the platform to be effective in medical education. UR - https://mededu.jmir.org/2024/1/e50842 UR - http://dx.doi.org/10.2196/50842 UR - http://www.ncbi.nlm.nih.gov/pubmed/38236632 ID - info:doi/10.2196/50842 ER - TY - JOUR AU - Nguyen, Tina PY - 2024/1/17 TI - ChatGPT in Medical Education: A Precursor for Automation Bias? JO - JMIR Med Educ SP - e50174 VL - 10 KW - ChatGPT KW - artificial intelligence KW - AI KW - medical students KW - residents KW - medical school curriculum KW - medical education KW - automation bias KW - large language models KW - LLMs KW - bias UR - https://mededu.jmir.org/2024/1/e50174 UR - http://dx.doi.org/10.2196/50174 UR - http://www.ncbi.nlm.nih.gov/pubmed/38231545 ID - info:doi/10.2196/50174 ER - TY - JOUR AU - Kuo, I-Hsien Nicholas AU - Perez-Concha, Oscar AU - Hanly, Mark AU - Mnatzaganian, Emmanuel AU - Hao, Brandon AU - Di Sipio, Marcus AU - Yu, Guolin AU - Vanjara, Jash AU - Valerie, Cerelia Ivy AU - de Oliveira Costa, Juliana AU - Churches, Timothy AU - Lujic, Sanja AU - Hegarty, Jo AU - Jorm, Louisa AU - Barbieri, Sebastiano PY - 2024/1/16 TI - Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project JO - JMIR Med Educ SP - e51388 VL - 10 KW - medical education KW - generative model KW - generative adversarial networks KW - privacy KW - antiretroviral therapy (ART) KW - human immunodeficiency virus (HIV) KW - data science KW - educational purposes KW - accessibility KW - data privacy KW - data sets KW - sepsis KW - hypotension KW - HIV KW - science education KW - health care AI UR - https://mededu.jmir.org/2024/1/e51388 UR - http://dx.doi.org/10.2196/51388 UR - http://www.ncbi.nlm.nih.gov/pubmed/38227356 ID - info:doi/10.2196/51388 ER - TY - JOUR AU - Long, Cai AU - Lowe, Kayle AU - Zhang, Jessica AU - Santos, dos André AU - Alanazi, Alaa AU - O'Brien, Daniel AU - Wright, D. Erin AU - Cote, David PY - 2024/1/16 TI - A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology?Head and Neck Surgery Certification Examinations: Performance Study JO - JMIR Med Educ SP - e49970 VL - 10 KW - medical licensing KW - otolaryngology KW - otology KW - laryngology KW - ear KW - nose KW - throat KW - ENT KW - surgery KW - surgical KW - exam KW - exams KW - response KW - responses KW - answer KW - answers KW - chatbot KW - chatbots KW - examination KW - examinations KW - medical education KW - otolaryngology/head and neck surgery KW - OHNS KW - artificial intelligence KW - AI KW - ChatGPT KW - medical examination KW - large language models KW - language model KW - LLM KW - LLMs KW - wide range information KW - patient safety KW - clinical implementation KW - safety KW - machine learning KW - NLP KW - natural language processing N2 - Background: ChatGPT is among the most popular large language models (LLMs), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on otolaryngology?head and neck surgery (OHNS) certification examinations and open-ended medical board certification examinations has not been reported. Objective: We aimed to evaluate the performance of ChatGPT on OHNS board examinations and propose a novel method to assess an AI model?s performance on open-ended medical board examination questions. Methods: Twenty-one open-ended questions were adopted from the Royal College of Physicians and Surgeons of Canada?s sample examination to query ChatGPT on April 11, 2023, with and without prompts. A new model, named Concordance, Validity, Safety, Competency (CVSC), was developed to evaluate its performance. Results: In an open-ended question assessment, ChatGPT achieved a passing mark (an average of 75% across 3 trials) in the attempts and demonstrated higher accuracy with prompts. The model demonstrated high concordance (92.06%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed. Conclusions: ChatGPT achieved a passing score in the sample examination and demonstrated the potential to pass the OHNS certification examination of the Royal College of Physicians and Surgeons of Canada. Some concerns remain due to its hallucinations, which could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation. UR - https://mededu.jmir.org/2024/1/e49970 UR - http://dx.doi.org/10.2196/49970 UR - http://www.ncbi.nlm.nih.gov/pubmed/38227351 ID - info:doi/10.2196/49970 ER - TY - JOUR AU - Al-Worafi, Mohammed Yaser AU - Goh, Wen Khang AU - Hermansyah, Andi AU - Tan, Siang Ching AU - Ming, Chiau Long PY - 2024/1/12 TI - The Use of ChatGPT for Education Modules on Integrated Pharmacotherapy of Infectious Disease: Educators' Perspectives JO - JMIR Med Educ SP - e47339 VL - 10 KW - innovation and technology KW - quality education KW - sustainable communities KW - innovation and infrastructure KW - partnerships for the goals KW - sustainable education KW - social justice KW - ChatGPT KW - artificial intelligence KW - feasibility N2 - Background: Artificial Intelligence (AI) plays an important role in many fields, including medical education, practice, and research. Many medical educators started using ChatGPT at the end of 2022 for many purposes. Objective: The aim of this study was to explore the potential uses, benefits, and risks of using ChatGPT in education modules on integrated pharmacotherapy of infectious disease. Methods: A content analysis was conducted to investigate the applications of ChatGPT in education modules on integrated pharmacotherapy of infectious disease. Questions pertaining to curriculum development, syllabus design, lecture note preparation, and examination construction were posed during data collection. Three experienced professors rated the appropriateness and precision of the answers provided by ChatGPT. The consensus rating was considered. The professors also discussed the prospective applications, benefits, and risks of ChatGPT in this educational setting. Results: ChatGPT demonstrated the ability to contribute to various aspects of curriculum design, with ratings ranging from 50% to 92% for appropriateness and accuracy. However, there were limitations and risks associated with its use, including incomplete syllabi, the absence of essential learning objectives, and the inability to design valid questionnaires and qualitative studies. It was suggested that educators use ChatGPT as a resource rather than relying primarily on its output. There are recommendations for effectively incorporating ChatGPT into the curriculum of the education modules on integrated pharmacotherapy of infectious disease. Conclusions: Medical and health sciences educators can use ChatGPT as a guide in many aspects related to the development of the curriculum of the education modules on integrated pharmacotherapy of infectious disease, syllabus design, lecture notes preparation, and examination preparation with caution. UR - https://mededu.jmir.org/2024/1/e47339 UR - http://dx.doi.org/10.2196/47339 UR - http://www.ncbi.nlm.nih.gov/pubmed/38214967 ID - info:doi/10.2196/47339 ER - TY - JOUR AU - Guo, Eddie AU - Gupta, Mehul AU - Deng, Jiawen AU - Park, Ye-Jean AU - Paget, Michael AU - Naugler, Christopher PY - 2024/1/12 TI - Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study JO - J Med Internet Res SP - e48996 VL - 26 KW - abstract screening KW - Chat GPT KW - classification KW - extract KW - extraction KW - free text KW - GPT KW - GPT-4 KW - language model KW - large language models KW - LLM KW - natural language processing KW - NLP KW - nonopiod analgesia KW - review methodology KW - review methods KW - screening KW - systematic review KW - systematic KW - unstructured data N2 - Background: The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. Objective: This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. Methods: We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. Results: Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was ?=0.46, and the prevalence and bias-adjusted ? between our proposed methods and the consensus-based human decisions was ?=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. Conclusions: Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research. UR - https://www.jmir.org/2024/1/e48996 UR - http://dx.doi.org/10.2196/48996 UR - http://www.ncbi.nlm.nih.gov/pubmed/38214966 ID - info:doi/10.2196/48996 ER - TY - JOUR AU - Nguyen, C. Quynh AU - Aparicio, M. Elizabeth AU - Jasczynski, Michelle AU - Channell Doig, Amara AU - Yue, Xiaohe AU - Mane, Heran AU - Srikanth, Neha AU - Gutierrez, Marin Francia Ximena AU - Delcid, Nataly AU - He, Xin AU - Boyd-Graber, Jordan PY - 2024/1/12 TI - Rosie, a Health Education Question-and-Answer Chatbot for New Mothers: Randomized Pilot Study JO - JMIR Form Res SP - e51361 VL - 8 KW - chatbot KW - health information KW - maternal and child health KW - health disparities KW - health equity KW - health informatics KW - preventive health care KW - postpartum care KW - patient education KW - newborn care KW - prenatal care KW - mobile phone N2 - Background: Stark disparities exist in maternal and child outcomes and there is a need to provide timely and accurate health information. Objective: In this pilot study, we assessed the feasibility and acceptability of a health chatbot for new mothers of color. Methods: Rosie, a question-and-answer chatbot, was developed as a mobile app and is available to answer questions about pregnancy, parenting, and child development. From January 9, 2023, to February 9, 2023, participants were recruited using social media posts and through engagement with community organizations. Inclusion criteria included being aged ?14 years, being a woman of color, and either being currently pregnant or having given birth within the past 6 months. Participants were randomly assigned to the Rosie treatment group (15/29, 52% received the Rosie app) or control group (14/29, 48% received a children?s book each month) for 3 months. Those assigned to the treatment group could ask Rosie questions and receive an immediate response generated from Rosie?s knowledgebase. Upon detection of a possible health emergency, Rosie sends emergency resources and relevant hotline information. In addition, a study staff member, who is a clinical social worker, reaches out to the participant within 24 hours to follow up. Preintervention and postintervention tests were completed to qualitatively and quantitatively evaluate Rosie and describe changes across key health outcomes, including postpartum depression and the frequency of emergency room visits. These measurements were used to inform the clinical trial?s sample size calculations. Results: Of 41 individuals who were screened and eligible, 31 (76%) enrolled and 29 (71%) were retained in the study. More than 87% (13/15) of Rosie treatment group members reported using Rosie daily (5/15, 33%) or weekly (8/15, 53%) across the 3-month study period. Most users reported that Rosie was easy to use (14/15, 93%) and provided responses quickly (13/15, 87%). The remaining issues identified included crashing of the app (8/15, 53%), and users were not satisfied with some of Rosie?s answers (12/15, 80%). Mothers in both the Rosie treatment group and control group experienced a decline in depression scores from pretest to posttest periods, but the decline was statistically significant only among treatment group mothers (P=.008). In addition, a low proportion of treatment group infants had emergency room visits (1/11, 9%) compared with control group members (3/13, 23%). Nonetheless, no between-group differences reached statistical significance at P<.05. Conclusions: Rosie was found to be an acceptable, feasible, and appropriate intervention for ethnic and racial minority pregnant women and mothers of infants owing to the chatbot?s ability to provide a personalized, flexible tool to increase the timeliness and accessibility of high-quality health information to individuals during a period of elevated health risks for the mother and child. Trial Registration: ClinicalTrials.gov NCT06053515; https://clinicaltrials.gov/study/NCT06053515 UR - https://formative.jmir.org/2024/1/e51361 UR - http://dx.doi.org/10.2196/51361 UR - http://www.ncbi.nlm.nih.gov/pubmed/38214963 ID - info:doi/10.2196/51361 ER - TY - JOUR AU - Zaleski, L. Amanda AU - Berkowsky, Rachel AU - Craig, Thomas Kelly Jean AU - Pescatello, S. Linda PY - 2024/1/11 TI - Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study JO - JMIR Med Educ SP - e51308 VL - 10 KW - exercise prescription KW - health literacy KW - large language model KW - patient education KW - artificial intelligence KW - AI KW - chatbot N2 - Background: Regular physical activity is critical for health and disease prevention. Yet, health care providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored. Objective: The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot. Methods: A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including (1) health condition?specific benefits of exercise, (2) exercise preparticipation health screening, (3) frequency, (4) intensity, (5) time, (6) type, (7) volume, (8) progression, (9) special considerations, and (10) references to the primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0%-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output. Results: AI-generated exercise recommendations were 41.2% (107/260) comprehensive and 90.7% (146/161) accurate, with the majority (8/15, 53%) of inaccuracy related to the need for exercise preparticipation medical clearance. Average readability level of AI-generated exercise recommendations was at the college level (mean 13.7, SD 1.7), with an average Flesch reading ease score of 31.1 (SD 7.7). Several recurring themes and observations of AI-generated output included concern for liability and safety, preference for aerobic exercise, and potential bias and direct discrimination against certain age-based populations and individuals with disabilities. Conclusions: There were notable gaps in the comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and health care professionals should be aware of these limitations when using and endorsing AI-based technologies as a tool to support lifestyle change involving exercise. UR - https://mededu.jmir.org/2024/1/e51308 UR - http://dx.doi.org/10.2196/51308 UR - http://www.ncbi.nlm.nih.gov/pubmed/38206661 ID - info:doi/10.2196/51308 ER - TY - JOUR AU - Farrand, Paul AU - Raue, J. Patrick AU - Ward, Earlise AU - Repper, Dean AU - Areán, Patricia PY - 2024/1/10 TI - Use and Engagement With Low-Intensity Cognitive Behavioral Therapy Techniques Used Within an App to Support Worry Management: Content Analysis of Log Data JO - JMIR Mhealth Uhealth SP - e47321 VL - 12 KW - cognitive behavioral therapy KW - low-intensity KW - mCBT KW - app KW - log data KW - worry management KW - CBT KW - management KW - application KW - therapy KW - implementation KW - treatment KW - symptoms KW - anxiety KW - worry KW - engagement N2 - Background: Low-intensity cognitive behavioral therapy (LICBT) has been implemented by the Improving Access to Psychological Therapies services across England to manage excessive worry associated with generalized anxiety disorder and support emotional well-being. However, barriers to access limit scalability. A solution has been to incorporate LICBT techniques derived from an evidence-based protocol within the Iona Mind Well-being app for Worry management (IMWW) with support provided through an algorithmically driven conversational agent. Objective: This study aims to examine engagement with a mobile phone app to support worry management with specific attention directed toward interaction with specific LICBT techniques and examine the potential to reduce symptoms of anxiety. Methods: Log data were examined with respect to a sample of ?engaged? users who had completed at least 1 lesson related to the Worry Time and Problem Solving in-app modules that represented the ?minimum dose.? Paired sample 2-tailed t tests were undertaken to examine the potential for IMWW to reduce worry and anxiety, with multivariate linear regressions examining the extent to which completion of each of the techniques led to reductions in worry and anxiety. Results: There was good engagement with the range of specific LICBT techniques included within IMWW. The vast majority of engaged users were able to interact with the cognitive behavioral therapy model and successfully record types of worry. When working through Problem Solving, the conversational agent was successfully used to support the user with lower levels of engagement. Several users engaged with Worry Time outside of the app. Forgetting to use the app was the most common reason for lack of engagement, with features of the app such as completion of routine outcome measures and weekly reflections having lower levels of engagement. Despite difficulties in the collection of end point data, there was a significant reduction in severity for both anxiety (t53=5.5; P<.001; 95% CI 2.4-5.2) and low mood (t53=2.3; P=.03; 95% CI 0.2-3.3). A statistically significant linear model was also fitted to the Generalized Anxiety Disorder?7 (F2,51=6.73; P<.001), while the model predicting changes in the Patient Health Questionnaire?8 did not reach significance (F2,51=2.33; P=.11). This indicates that the reduction in these measures was affected by in-app engagement with Worry Time and Problem Solving. Conclusions: Engaged users were able to successfully interact with the LICBT-specific techniques informed by an evidence-based protocol although there were lower completion rates of routine outcome measures and weekly reflections. Successful interaction with the specific techniques potentially contributes to promising data, indicating that IMWW may be effective in the management of excessive worry. A relationship between dose and improvement justifies the use of log data to inform future developments. However, attention needs to be directed toward enhancing interaction with wider features of the app given that larger improvements were associated with greater engagement. UR - https://mhealth.jmir.org/2024/1/e47321 UR - http://dx.doi.org/10.2196/47321 UR - http://www.ncbi.nlm.nih.gov/pubmed/38029300 ID - info:doi/10.2196/47321 ER - TY - JOUR AU - Jiang, Zhili AU - Huang, Xiting AU - Wang, Zhiqian AU - Liu, Yang AU - Huang, Lihua AU - Luo, Xiaolin PY - 2024/1/9 TI - Embodied Conversational Agents for Chronic Diseases: Scoping Review JO - J Med Internet Res SP - e47134 VL - 26 KW - embodied conversational agent KW - ECA KW - chronic diseases KW - eHealth KW - health care KW - mobile phone N2 - Background: Embodied conversational agents (ECAs) are computer-generated animated humanlike characters that interact with users through verbal and nonverbal behavioral cues. They are increasingly used in a range of fields, including health care. Objective: This scoping review aims to identify the current practice in the development and evaluation of ECAs for chronic diseases. Methods: We applied a methodological framework in this review. A total of 6 databases (ie, PubMed, Embase, CINAHL, ACM Digital Library, IEEE Xplore Digital Library, and Web of Science) were searched using a combination of terms related to ECAs and health in October 2023. Two independent reviewers selected the studies and extracted the data. This review followed the PRISMA-ScR (Preferred Reporting Items of Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) statement. Results: The literature search found 6332 papers, of which 36 (0.57%) met the inclusion criteria. Among the 36 studies, 27 (75%) originated from the United States, and 28 (78%) were published from 2020 onward. The reported ECAs covered a wide range of chronic diseases, with a focus on cancers, atrial fibrillation, and type 2 diabetes, primarily to promote screening and self-management. Most ECAs were depicted as middle-aged women based on screenshots and communicated with users through voice and nonverbal behavior. The most frequently reported evaluation outcomes were acceptability and effectiveness. Conclusions: This scoping review provides valuable insights for technology developers and health care professionals regarding the development and implementation of ECAs. It emphasizes the importance of technological advances in the embodiment, personalized strategy, and communication modality and requires in-depth knowledge of user preferences regarding appearance, animation, and intervention content. Future studies should incorporate measures of cost, efficiency, and productivity to provide a comprehensive evaluation of the benefits of using ECAs in health care. UR - https://www.jmir.org/2024/1/e47134 UR - http://dx.doi.org/10.2196/47134 UR - http://www.ncbi.nlm.nih.gov/pubmed/38194260 ID - info:doi/10.2196/47134 ER - TY - JOUR AU - Lim, Adrian Wendell AU - Custodio, Razel AU - Sunga, Monica AU - Amoranto, Jayne Abegail AU - Sarmiento, Francis Raymond PY - 2024/1/5 TI - General Characteristics and Design Taxonomy of Chatbots for COVID-19: Systematic Review JO - J Med Internet Res SP - e43112 VL - 26 KW - COVID-19 KW - health chatbot KW - conversational agent in health care KW - artificial intelligence KW - systematic review KW - mobile phone N2 - Background: A conversational agent powered by artificial intelligence, commonly known as a chatbot, is one of the most recent innovations used to provide information and services during the COVID-19 pandemic. However, the multitude of conversational agents explicitly designed during the COVID-19 pandemic calls for characterization and analysis using rigorous technological frameworks and extensive systematic reviews. Objective: This study aims to describe the general characteristics of COVID-19 chatbots and examine their system designs using a modified adapted design taxonomy framework. Methods: We conducted a systematic review of the general characteristics and design taxonomy of COVID-19 chatbots, with 56 studies included in the final analysis. This review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to select papers published between March 2020 and April 2022 from various databases and search engines. Results: Results showed that most studies on COVID-19 chatbot design and development worldwide are implemented in Asia and Europe. Most chatbots are also accessible on websites, internet messaging apps, and Android devices. The COVID-19 chatbots are further classified according to their temporal profiles, appearance, intelligence, interaction, and context for system design trends. From the temporal profile perspective, almost half of the COVID-19 chatbots interact with users for several weeks for >1 time and can remember information from previous user interactions. From the appearance perspective, most COVID-19 chatbots assume the expert role, are task oriented, and have no visual or avatar representation. From the intelligence perspective, almost half of the COVID-19 chatbots are artificially intelligent and can respond to textual inputs and a set of rules. In addition, more than half of these chatbots operate on a structured flow and do not portray any socioemotional behavior. Most chatbots can also process external data and broadcast resources. Regarding their interaction with users, most COVID-19 chatbots are adaptive, can communicate through text, can react to user input, are not gamified, and do not require additional human support. From the context perspective, all COVID-19 chatbots are goal oriented, although most fall under the health care application domain and are designed to provide information to the user. Conclusions: The conceptualization, development, implementation, and use of COVID-19 chatbots emerged to mitigate the effects of a global pandemic in societies worldwide. This study summarized the current system design trends of COVID-19 chatbots based on 5 design perspectives, which may help developers conveniently choose a future-proof chatbot archetype that will meet the needs of the public in the face of growing demand for a better pandemic response. UR - https://www.jmir.org/2024/1/e43112 UR - http://dx.doi.org/10.2196/43112 UR - http://www.ncbi.nlm.nih.gov/pubmed/38064638 ID - info:doi/10.2196/43112 ER - TY - JOUR AU - Weidener, Lukas AU - Fischer, Michael PY - 2024/1/5 TI - Artificial Intelligence in Medicine: Cross-Sectional Study Among Medical Students on Application, Education, and Ethical Aspects JO - JMIR Med Educ SP - e51247 VL - 10 KW - artificial intelligence KW - AI technology KW - medicine KW - medical education KW - medical curriculum KW - medical school KW - AI ethics KW - ethics N2 - Background: The use of artificial intelligence (AI) in medicine not only directly impacts the medical profession but is also increasingly associated with various potential ethical aspects. In addition, the expanding use of AI and AI-based applications such as ChatGPT demands a corresponding shift in medical education to adequately prepare future practitioners for the effective use of these tools and address the associated ethical challenges they present. Objective: This study aims to explore how medical students from Germany, Austria, and Switzerland perceive the use of AI in medicine and the teaching of AI and AI ethics in medical education in accordance with their use of AI-based chat applications, such as ChatGPT. Methods: This cross-sectional study, conducted from June 15 to July 15, 2023, surveyed medical students across Germany, Austria, and Switzerland using a web-based survey. This study aimed to assess students? perceptions of AI in medicine and the integration of AI and AI ethics into medical education. The survey, which included 53 items across 6 sections, was developed and pretested. Data analysis used descriptive statistics (median, mode, IQR, total number, and percentages) and either the chi-square or Mann-Whitney U tests, as appropriate. Results: Surveying 487 medical students across Germany, Austria, and Switzerland revealed limited formal education on AI or AI ethics within medical curricula, although 38.8% (189/487) had prior experience with AI-based chat applications, such as ChatGPT. Despite varied prior exposures, 71.7% (349/487) anticipated a positive impact of AI on medicine. There was widespread consensus (385/487, 74.9%) on the need for AI and AI ethics instruction in medical education, although the current offerings were deemed inadequate. Regarding the AI ethics education content, all proposed topics were rated as highly relevant. Conclusions: This study revealed a pronounced discrepancy between the use of AI-based (chat) applications, such as ChatGPT, among medical students in Germany, Austria, and Switzerland and the teaching of AI in medical education. To adequately prepare future medical professionals, there is an urgent need to integrate the teaching of AI and AI ethics into the medical curricula. UR - https://mededu.jmir.org/2024/1/e51247 UR - http://dx.doi.org/10.2196/51247 UR - http://www.ncbi.nlm.nih.gov/pubmed/38180787 ID - info:doi/10.2196/51247 ER - TY - JOUR AU - Knoedler, Leonard AU - Alfertshofer, Michael AU - Knoedler, Samuel AU - Hoch, C. Cosima AU - Funk, F. Paul AU - Cotofana, Sebastian AU - Maheta, Bhagvat AU - Frank, Konstantin AU - Brébant, Vanessa AU - Prantl, Lukas AU - Lamby, Philipp PY - 2024/1/5 TI - Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis JO - JMIR Med Educ SP - e51148 VL - 10 KW - ChatGPT KW - United States Medical Licensing Examination KW - artificial intelligence KW - USMLE KW - USMLE Step 1 KW - OpenAI KW - medical education KW - clinical decision-making N2 - Background: The United States Medical Licensing Examination (USMLE) has been critical in medical education since 1992, testing various aspects of a medical student?s knowledge and skills through different steps, based on their training level. Artificial intelligence (AI) tools, including chatbots like ChatGPT, are emerging technologies with potential applications in medicine. However, comprehensive studies analyzing ChatGPT?s performance on USMLE Step 3 in large-scale scenarios and comparing different versions of ChatGPT are limited. Objective: This paper aimed to analyze ChatGPT?s performance on USMLE Step 3 practice test questions to better elucidate the strengths and weaknesses of AI use in medical education and deduce evidence-based strategies to counteract AI cheating. Methods: A total of 2069 USMLE Step 3 practice questions were extracted from the AMBOSS study platform. After including 229 image-based questions, a total of 1840 text-based questions were further categorized and entered into ChatGPT 3.5, while a subset of 229 questions were entered into ChatGPT 4. Responses were recorded, and the accuracy of ChatGPT answers as well as its performance in different test question categories and for different difficulty levels were compared between both versions. Results: Overall, ChatGPT 4 demonstrated a statistically significant superior performance compared to ChatGPT 3.5, achieving an accuracy of 84.7% (194/229) and 56.9% (1047/1840), respectively. A noteworthy correlation was observed between the length of test questions and the performance of ChatGPT 3.5 (?=?0.069; P=.003), which was absent in ChatGPT 4 (P=.87). Additionally, the difficulty of test questions, as categorized by AMBOSS hammer ratings, showed a statistically significant correlation with performance for both ChatGPT versions, with ?=?0.289 for ChatGPT 3.5 and ?=?0.344 for ChatGPT 4. ChatGPT 4 surpassed ChatGPT 3.5 in all levels of test question difficulty, except for the 2 highest difficulty tiers (4 and 5 hammers), where statistical significance was not reached. Conclusions: In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics. UR - https://mededu.jmir.org/2024/1/e51148 UR - http://dx.doi.org/10.2196/51148 UR - http://www.ncbi.nlm.nih.gov/pubmed/38180782 ID - info:doi/10.2196/51148 ER - TY - JOUR AU - Blease, Charlotte AU - Torous, John AU - McMillan, Brian AU - Hägglund, Maria AU - Mandl, D. Kenneth PY - 2024/1/4 TI - Generative Language Models and Open Notes: Exploring the Promise and Limitations JO - JMIR Med Educ SP - e51183 VL - 10 KW - ChatGPT KW - generative language models KW - large language models KW - medical education KW - Open Notes KW - online record access KW - patient-centered care KW - empathy KW - language model KW - documentation KW - communication tool KW - clinical documentation UR - https://mededu.jmir.org/2024/1/e51183 UR - http://dx.doi.org/10.2196/51183 UR - http://www.ncbi.nlm.nih.gov/pubmed/38175688 ID - info:doi/10.2196/51183 ER - TY - JOUR AU - Erren, C. Thomas PY - 2024/1/4 TI - Patients, Doctors, and Chatbots JO - JMIR Med Educ SP - e50869 VL - 10 KW - chatbot KW - ChatGPT KW - medical advice KW - ethics KW - patients KW - doctors UR - https://mededu.jmir.org/2024/1/e50869 UR - http://dx.doi.org/10.2196/50869 UR - http://www.ncbi.nlm.nih.gov/pubmed/38175695 ID - info:doi/10.2196/50869 ER - TY - JOUR AU - Wang, Changyu AU - Liu, Siru AU - Li, Aiqing AU - Liu, Jialin PY - 2023/12/29 TI - Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study JO - J Med Internet Res SP - e51501 VL - 25 KW - artificial intelligence KW - AI KW - AI models KW - ChatGPT KW - primary screening KW - mild cognitive impairment KW - standardization KW - prompt design KW - design KW - cognitive impairment KW - screening KW - model KW - clinician KW - diagnosis N2 - Background: Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone. Objective: In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts. Methods: We gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators. Results: Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively. Conclusions: ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis. UR - https://www.jmir.org/2023/1/e51501 UR - http://dx.doi.org/10.2196/51501 UR - http://www.ncbi.nlm.nih.gov/pubmed/38157230 ID - info:doi/10.2196/51501 ER - TY - JOUR AU - Giannakopoulos, Kostis AU - Kavadella, Argyro AU - Aaqel Salim, Anas AU - Stamatopoulos, Vassilis AU - Kaklamanos, G. Eleftherios PY - 2023/12/28 TI - Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study JO - J Med Internet Res SP - e51580 VL - 25 KW - artificial intelligence KW - AI KW - large language models KW - generative pretrained transformers KW - evidence-based dentistry KW - ChatGPT KW - Google Bard KW - Microsoft Bing KW - clinical practice KW - dental professional KW - dental practice KW - clinical decision-making KW - clinical practice guidelines N2 - Background: The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including dentistry, raises questions about their accuracy. Objective: This study aims to comparatively evaluate the answers provided by 4 LLMs, namely Bard (Google LLC), ChatGPT-3.5 and ChatGPT-4 (OpenAI), and Bing Chat (Microsoft Corp), to clinically relevant questions from the field of dentistry. Methods: The LLMs were queried with 20 open-type, clinical dentistry?related questions from different disciplines, developed by the respective faculty of the School of Dentistry, European University Cyprus. The LLMs? answers were graded 0 (minimum) to 10 (maximum) points against strong, traditionally collected scientific evidence, such as guidelines and consensus statements, using a rubric, as if they were examination questions posed to students, by 2 experienced faculty members. The scores were statistically compared to identify the best-performing model using the Friedman and Wilcoxon tests. Moreover, the evaluators were asked to provide a qualitative evaluation of the comprehensiveness, scientific accuracy, clarity, and relevance of the LLMs? answers. Results: Overall, no statistically significant difference was detected between the scores given by the 2 evaluators; therefore, an average score was computed for every LLM. Although ChatGPT-4 statistically outperformed ChatGPT-3.5 (P=.008), Bing Chat (P=.049), and Bard (P=.045), all models occasionally exhibited inaccuracies, generality, outdated content, and a lack of source references. The evaluators noted instances where the LLMs delivered irrelevant information, vague answers, or information that was not fully accurate. Conclusions: This study demonstrates that although LLMs hold promising potential as an aid in the implementation of evidence-based dentistry, their current limitations can lead to potentially harmful health care decisions if not used judiciously. Therefore, these tools should not replace the dentist?s critical thinking and in-depth understanding of the subject matter. Further research, clinical validation, and model improvements are necessary for these tools to be fully integrated into dental practice. Dental practitioners must be aware of the limitations of LLMs, as their imprudent use could potentially impact patient care. Regulatory measures should be established to oversee the use of these evolving technologies. UR - https://www.jmir.org/2023/1/e51580 UR - http://dx.doi.org/10.2196/51580 UR - http://www.ncbi.nlm.nih.gov/pubmed/38009003 ID - info:doi/10.2196/51580 ER - TY - JOUR AU - Koranteng, Erica AU - Rao, Arya AU - Flores, Efren AU - Lev, Michael AU - Landman, Adam AU - Dreyer, Keith AU - Succi, Marc PY - 2023/12/28 TI - Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care JO - JMIR Med Educ SP - e51199 VL - 9 KW - ChatGPT KW - AI KW - artificial intelligence KW - large language models KW - LLMs KW - ethics KW - empathy KW - equity KW - bias KW - language model KW - health care application KW - patient care KW - care KW - development KW - framework KW - model KW - ethical implication UR - https://mededu.jmir.org/2023/1/e51199 UR - http://dx.doi.org/10.2196/51199 UR - http://www.ncbi.nlm.nih.gov/pubmed/38153778 ID - info:doi/10.2196/51199 ER - TY - JOUR AU - Liao, Wenxiong AU - Liu, Zhengliang AU - Dai, Haixing AU - Xu, Shaochen AU - Wu, Zihao AU - Zhang, Yiyang AU - Huang, Xiaoke AU - Zhu, Dajiang AU - Cai, Hongmin AU - Li, Quanzheng AU - Liu, Tianming AU - Li, Xiang PY - 2023/12/28 TI - Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study JO - JMIR Med Educ SP - e48904 VL - 9 KW - ChatGPT KW - medical ethics KW - linguistic analysis KW - text classification KW - artificial intelligence KW - medical texts KW - machine learning N2 - Background: Large language models, such as ChatGPT, are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts, such as clinical notes and diagnoses, require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to health care and the general public. Objective: This study is among the first on responsible artificial intelligence?generated content in medicine. We focus on analyzing the differences between medical texts written by human experts and those generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first constructed a suite of data sets containing medical texts written by human experts and generated by ChatGPT. We analyzed the linguistic features of these 2 types of content and uncovered differences in vocabulary, parts-of-speech, dependency, sentiment, perplexity, and other aspects. Finally, we designed and implemented machine learning methods to detect medical text generated by ChatGPT. The data and code used in this paper are published on GitHub. Results: Medical texts written by humans were more concrete, more diverse, and typically contained more useful information, while medical texts generated by ChatGPT paid more attention to fluency and logic and usually expressed general terminologies rather than effective information specific to the context of the problem. A bidirectional encoder representations from transformers?based model effectively detected medical texts generated by ChatGPT, and the F1 score exceeded 95%. Conclusions: Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of generated medical texts were different from those written by human experts. Medical text generated by ChatGPT could be effectively detected by the proposed machine learning algorithms. This study provides a pathway toward trustworthy and accountable use of large language models in medicine. UR - https://mededu.jmir.org/2023/1/e48904 UR - http://dx.doi.org/10.2196/48904 UR - http://www.ncbi.nlm.nih.gov/pubmed/38153785 ID - info:doi/10.2196/48904 ER - TY - JOUR AU - Powell, Leigh AU - Nour, Radwa AU - Sleibi, Randa AU - Al Suwaidi, Hanan AU - Zary, Nabil PY - 2023/12/28 TI - Democratizing the Development of Chatbots to Improve Public Health: Feasibility Study of COVID-19 Misinformation JO - JMIR Hum Factors SP - e43120 VL - 10 KW - COVID-19 KW - vaccine hesitancy KW - infodemic KW - chatbot KW - motivational interviewing KW - social media KW - conversational agent KW - misinformation KW - online health information KW - usability study KW - vaccine misinformation N2 - Background: Chatbots enable users to have humanlike conversations on various topics and can vary widely in complexity and functionality. An area of research priority in chatbots is democratizing chatbots to all, removing barriers to entry, such as financial ones, to help make chatbots a possibility for the wider global population to improve access to information, help reduce the digital divide between nations, and improve areas of public good (eg, health communication). Chatbots in this space may help create the potential for improved health outcomes, potentially alleviating some of the burdens on health care providers and systems to be the sole voices of outreach to public health. Objective: This study explored the feasibility of developing a chatbot using approaches that are accessible in low- and middle-resource settings, such as using technology that is low cost, can be developed by nonprogrammers, and can be deployed over social media platforms to reach the broadest-possible audience without the need for a specialized technical team. Methods: This study is presented in 2 parts. First, we detailed the design and development of a chatbot, VWise, including the resources used and development considerations for the conversational model. Next, we conducted a case study of 33 participants who engaged in a pilot with our chatbot. We explored the following 3 research questions: (1) Is it feasible to develop and implement a chatbot addressing a public health issue with only minimal resources? (2) What is the participants? experience with using the chatbot? (3) What kinds of measures of engagement are observed from using the chatbot? Results: A high level of engagement with the chatbot was demonstrated by the large number of participants who stayed with the conversation to its natural end (n=17, 52%), requested to see the free online resource, selected to view all information about a given concern, and returned to have a dialogue about a second concern (n=12, 36%). Conclusions: This study explored the feasibility of and the design and development considerations for a chatbot, VWise. Our early findings from this initial pilot suggest that developing a functioning and low-cost chatbot is feasible, even in low-resource environments. Our results show that low-resource environments can enter the health communication chatbot space using readily available human and technical resources. However, despite these early indicators, many limitations exist in this study and further work with a larger sample size and greater diversity of participants is needed. This study represents early work on a chatbot in its virtual infancy. We hope this study will help provide those who feel chatbot access may be out of reach with a useful guide to enter this space, enabling more democratized access to chatbots for all. UR - https://humanfactors.jmir.org/2023/1/e43120 UR - http://dx.doi.org/10.2196/43120 UR - http://www.ncbi.nlm.nih.gov/pubmed/37290040 ID - info:doi/10.2196/43120 ER - TY - JOUR AU - Tan, Chin Tze AU - Roslan, Binte Nur Emillia AU - Li, Weiquan James AU - Zou, Xinying AU - Chen, Xiangmei AU - Ratnasari AU - Santosa, Anindita PY - 2023/12/28 TI - Patient Acceptability of Symptom Screening and Patient Education Using a Chatbot for Autoimmune Inflammatory Diseases: Survey Study JO - JMIR Form Res SP - e49239 VL - 7 KW - conversational agents KW - digital technology in medicine KW - rheumatology KW - early diagnosis KW - education KW - patient?physician interactions KW - autoimmune rheumatic diseases KW - chatbot KW - implementation KW - patient survey KW - digital health intervention N2 - Background: Chatbots have the potential to enhance health care interaction, satisfaction, and service delivery. However, data regarding their acceptance across diverse patient populations are limited. In-depth studies on the reception of chatbots by patients with chronic autoimmune inflammatory diseases are lacking, although such studies are vital for facilitating the effective integration of chatbots in rheumatology care. Objective: We aim to assess patient perceptions and acceptance of a chatbot designed for autoimmune inflammatory rheumatic diseases (AIIRDs). Methods: We administered a comprehensive survey in an outpatient setting at a top-tier rheumatology referral center. The target cohort included patients who interacted with a chatbot explicitly tailored to facilitate diagnosis and obtain information on AIIRDs. Following the RE-AIM (Reach, Effectiveness, Adoption, Implementation and Maintenance) framework, the survey was designed to gauge the effectiveness, user acceptability, and implementation of the chatbot. Results: Between June and October 2022, we received survey responses from 200 patients, with an equal number of 100 initial consultations and 100 follow-up (FU) visits. The mean scores on a 5-point acceptability scale ranged from 4.01 (SD 0.63) to 4.41 (SD 0.54), indicating consistently high ratings across the different aspects of chatbot performance. Multivariate regression analysis indicated that having a FU visit was significantly associated with a greater willingness to reuse the chatbot for symptom determination (P=.01). Further, patients? comfort with chatbot diagnosis increased significantly after meeting physicians (P<.001). We observed no significant differences in chatbot acceptance according to sex, education level, or diagnosis category. Conclusions: This study underscores that chatbots tailored to AIIRDs have a favorable reception. The inclination of FU patients to engage with the chatbot signifies the possible influence of past clinical encounters and physician affirmation on its use. Although further exploration is required to refine their integration, the prevalent positive perceptions suggest that chatbots have the potential to strengthen the bridge between patients and health care providers, thus enhancing the delivery of rheumatology care to various cohorts. UR - https://formative.jmir.org/2023/1/e49239 UR - http://dx.doi.org/10.2196/49239 UR - http://www.ncbi.nlm.nih.gov/pubmed/37219234 ID - info:doi/10.2196/49239 ER - TY - JOUR AU - ?irkovi?, Aleksandar AU - Katz, Toam PY - 2023/12/28 TI - Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study JO - JMIR Form Res SP - e51798 VL - 7 KW - artificial intelligence KW - machine learning KW - decision support systems KW - clinical KW - refractive surgical procedures KW - risk assessment KW - ophthalmology KW - health informatics KW - predictive modeling KW - data analysis KW - medical decision-making KW - eHealth KW - ChatGPT-4 KW - ChatGPT KW - refractive surgery KW - categorization KW - AI-powered algorithm KW - large language model KW - decision-making N2 - Background: Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence?powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored. Objective: This exploratory study aimed to validate ChatGPT-4?s capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4?s performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared. Methods: Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4?s performance with a clinician?s categorizations using Cohen ? coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve. Results: A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician?s categorizations with a Cohen ? coefficient of 0.399 for 6 categories (95% CI 0.256-0.537) and 0.610 for binary categorization (95% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters? distributions (?²5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79. Conclusions: This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT?s (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model?s accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation. UR - https://formative.jmir.org/2023/1/e51798 UR - http://dx.doi.org/10.2196/51798 UR - http://www.ncbi.nlm.nih.gov/pubmed/38153777 ID - info:doi/10.2196/51798 ER - TY - JOUR AU - Cheng, Shu-Li AU - Tsai, Shih-Jen AU - Bai, Ya-Mei AU - Ko, Chih-Hung AU - Hsu, Chih-Wei AU - Yang, Fu-Chi AU - Tsai, Chia-Kuang AU - Tu, Yu-Kang AU - Yang, Szu-Nian AU - Tseng, Ping-Tao AU - Hsu, Tien-Wei AU - Liang, Chih-Sung AU - Su, Kuan-Pin PY - 2023/12/25 TI - Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study JO - J Med Internet Res SP - e51229 VL - 25 KW - ChatGPT KW - abstract KW - AI-generated scientific content KW - plagiarism KW - artificial intelligence KW - NLP KW - natural language processing KW - LLM KW - language model KW - language models KW - text KW - textual KW - generation KW - generative KW - extract KW - extraction KW - scientific research KW - academic research KW - publication KW - publications KW - abstracts N2 - Background: ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers. Objective: We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research. Methods: We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content. Results: The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference ?4.33; 95% CI ?4.79 to ?3.86; P<.001) but minimal in the 4-subheading structured format (mean difference ?2.33; 95% CI ?2.79 to ?1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT. Conclusions: Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%. UR - https://www.jmir.org/2023/1/e51229 UR - http://dx.doi.org/10.2196/51229 UR - http://www.ncbi.nlm.nih.gov/pubmed/38145486 ID - info:doi/10.2196/51229 ER - TY - JOUR AU - Knopp, I. Michelle AU - Warm, J. Eric AU - Weber, Danielle AU - Kelleher, Matthew AU - Kinnear, Benjamin AU - Schumacher, J. Daniel AU - Santen, A. Sally AU - Mendonça, Eneida AU - Turner, Laurah PY - 2023/12/25 TI - AI-Enabled Medical Education: Threads of Change, Promising Futures, and Risky Realities Across Four Potential Future Worlds JO - JMIR Med Educ SP - e50373 VL - 9 KW - artificial intelligence KW - medical education KW - scenario planning KW - future of healthcare KW - ethics and AI KW - future KW - scenario KW - ChatGPT KW - generative KW - GPT-4 KW - ethic KW - ethics KW - ethical KW - strategic planning KW - Open-AI KW - OpenAI KW - privacy KW - autonomy KW - autonomous N2 - Background: The rapid trajectory of artificial intelligence (AI) development and advancement is quickly outpacing society's ability to determine its future role. As AI continues to transform various aspects of our lives, one critical question arises for medical education: what will be the nature of education, teaching, and learning in a future world where the acquisition, retention, and application of knowledge in the traditional sense are fundamentally altered by AI? Objective: The purpose of this perspective is to plan for the intersection of health care and medical education in the future. Methods: We used GPT-4 and scenario-based strategic planning techniques to craft 4 hypothetical future worlds influenced by AI's integration into health care and medical education. This method, used by organizations such as Shell and the Accreditation Council for Graduate Medical Education, assesses readiness for alternative futures and effectively manages uncertainty, risk, and opportunity. The detailed scenarios provide insights into potential environments the medical profession may face and lay the foundation for hypothesis generation and idea-building regarding responsible AI implementation. Results: The following 4 worlds were created using OpenAI?s GPT model: AI Harmony, AI conflict, The world of Ecological Balance, and Existential Risk. Risks include disinformation and misinformation, loss of privacy, widening inequity, erosion of human autonomy, and ethical dilemmas. Benefits involve improved efficiency, personalized interventions, enhanced collaboration, early detection, and accelerated research. Conclusions: To ensure responsible AI use, the authors suggest focusing on 3 key areas: developing a robust ethical framework, fostering interdisciplinary collaboration, and investing in education and training. A strong ethical framework emphasizes patient safety, privacy, and autonomy while promoting equity and inclusivity. Interdisciplinary collaboration encourages cooperation among various experts in developing and implementing AI technologies, ensuring that they address the complex needs and challenges in health care and medical education. Investing in education and training prepares professionals and trainees with necessary skills and knowledge to effectively use and critically evaluate AI technologies. The integration of AI in health care and medical education presents a critical juncture between transformative advancements and significant risks. By working together to address both immediate and long-term risks and consequences, we can ensure that AI integration leads to a more equitable, sustainable, and prosperous future for both health care and medical education. As we engage with AI technologies, our collective actions will ultimately determine the state of the future of health care and medical education to harness AI's power while ensuring the safety and well-being of humanity. UR - https://mededu.jmir.org/2023/1/e50373 UR - http://dx.doi.org/10.2196/50373 UR - http://www.ncbi.nlm.nih.gov/pubmed/38145471 ID - info:doi/10.2196/50373 ER - TY - JOUR AU - Ziegelmayer, Sebastian AU - Marka, W. Alexander AU - Lenhart, Nicolas AU - Nehls, Nadja AU - Reischl, Stefan AU - Harder, Felix AU - Sauter, Andreas AU - Makowski, Marcus AU - Graf, Markus AU - Gawlitza, Joshua PY - 2023/12/22 TI - Evaluation of GPT-4?s Chest X-Ray Impression Generation: A Reader Study on Performance and Perception JO - J Med Internet Res SP - e50865 VL - 25 KW - generative model KW - GPT KW - medical imaging KW - artificial intelligence KW - imaging KW - radiology KW - radiological KW - radiography KW - diagnostic KW - chest KW - x-ray KW - x-rays KW - generative KW - multimodal KW - impression KW - impressions KW - image KW - images KW - AI UR - https://www.jmir.org/2023/1/e50865 UR - http://dx.doi.org/10.2196/50865 UR - http://www.ncbi.nlm.nih.gov/pubmed/38133918 ID - info:doi/10.2196/50865 ER - TY - JOUR AU - Alkhaaldi, I. Saif M. AU - Kassab, H. Carl AU - Dimassi, Zakia AU - Oyoun Alsoud, Leen AU - Al Fahim, Maha AU - Al Hageh, Cynthia AU - Ibrahim, Halah PY - 2023/12/22 TI - Medical Student Experiences and Perceptions of ChatGPT and Artificial Intelligence: Cross-Sectional Study JO - JMIR Med Educ SP - e51302 VL - 9 KW - medical education KW - ChatGPT KW - artificial intelligence KW - large language models KW - LLMs KW - AI KW - medical student KW - medical students KW - cross-sectional study KW - training KW - technology KW - medicine KW - health care professionals KW - risk KW - education N2 - Background: Artificial intelligence (AI) has the potential to revolutionize the way medicine is learned, taught, and practiced, and medical education must prepare learners for these inevitable changes. Academic medicine has, however, been slow to embrace recent AI advances. Since its launch in November 2022, ChatGPT has emerged as a fast and user-friendly large language model that can assist health care professionals, medical educators, students, trainees, and patients. While many studies focus on the technology?s capabilities, potential, and risks, there is a gap in studying the perspective of end users. Objective: The aim of this study was to gauge the experiences and perspectives of graduating medical students on ChatGPT and AI in their training and future careers. Methods: A cross-sectional web-based survey of recently graduated medical students was conducted in an international academic medical center between May 5, 2023, and June 13, 2023. Descriptive statistics were used to tabulate variable frequencies. Results: Of 325 applicants to the residency programs, 265 completed the survey (an 81.5% response rate). The vast majority of respondents denied using ChatGPT in medical school, with 20.4% (n=54) using it to help complete written assessments and only 9.4% using the technology in their clinical work (n=25). More students planned to use it during residency, primarily for exploring new medical topics and research (n=168, 63.4%) and exam preparation (n=151, 57%). Male students were significantly more likely to believe that AI will improve diagnostic accuracy (n=47, 51.7% vs n=69, 39.7%; P=.001), reduce medical error (n=53, 58.2% vs n=71, 40.8%; P=.002), and improve patient care (n=60, 65.9% vs n=95, 54.6%; P=.007). Previous experience with AI was significantly associated with positive AI perception in terms of improving patient care, decreasing medical errors and misdiagnoses, and increasing the accuracy of diagnoses (P=.001, P<.001, P=.008, respectively). Conclusions: The surveyed medical students had minimal formal and informal experience with AI tools and limited perceptions of the potential uses of AI in health care but had overall positive views of ChatGPT and AI and were optimistic about the future of AI in medical education and health care. Structured curricula and formal policies and guidelines are needed to adequately prepare medical learners for the forthcoming integration of AI in medicine. UR - https://mededu.jmir.org/2023/1/e51302 UR - http://dx.doi.org/10.2196/51302 UR - http://www.ncbi.nlm.nih.gov/pubmed/38133911 ID - info:doi/10.2196/51302 ER - TY - JOUR AU - Tangadulrat, Pasin AU - Sono, Supinya AU - Tangtrakulwanich, Boonsin PY - 2023/12/22 TI - Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students? and Physicians? Perceptions JO - JMIR Med Educ SP - e50658 VL - 9 KW - ChatGPT KW - AI KW - artificial intelligence KW - medical education KW - medical students KW - student KW - students KW - intern KW - interns KW - resident KW - residents KW - knee osteoarthritis KW - survey KW - surveys KW - questionnaire KW - questionnaires KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - attitude KW - attitudes KW - opinion KW - opinions KW - perception KW - perceptions KW - perspective KW - perspectives KW - acceptance N2 - Background: ChatGPT is a well-known large language model?based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. Objective: We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. Methods: A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT?s generated response about knee osteoarthritis. Results: Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT?s response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT?s pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT?s response was too superficial, might lack scientific evidence, and might need expert verification. Conclusions: Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials. UR - https://mededu.jmir.org/2023/1/e50658 UR - http://dx.doi.org/10.2196/50658 UR - http://www.ncbi.nlm.nih.gov/pubmed/38133908 ID - info:doi/10.2196/50658 ER - TY - JOUR AU - Xue, Jia AU - Zhang, Bolun AU - Zhao, Yaxi AU - Zhang, Qiaoru AU - Zheng, Chengda AU - Jiang, Jielin AU - Li, Hanjia AU - Liu, Nian AU - Li, Ziqian AU - Fu, Weiying AU - Peng, Yingdong AU - Logan, Judith AU - Zhang, Jingwen AU - Xiang, Xiaoling PY - 2023/12/19 TI - Evaluation of the Current State of Chatbots for Digital Health: Scoping Review JO - J Med Internet Res SP - e47217 VL - 25 KW - artificial intelligence KW - chatbot KW - health KW - mental health KW - suicide KW - suicidal KW - conversational capacity KW - relational capacity KW - personalization KW - in-app reviews KW - experience KW - experiences KW - scoping KW - review methods KW - review methodology KW - chatbots KW - conversational agent KW - conversational agents N2 - Background: Chatbots have become ubiquitous in our daily lives, enabling natural language conversations with users through various modes of communication. Chatbots have the potential to play a significant role in promoting health and well-being. As the number of studies and available products related to chatbots continues to rise, there is a critical need to assess product features to enhance the design of chatbots that effectively promote health and behavioral change. Objective: This scoping review aims to provide a comprehensive assessment of the current state of health-related chatbots, including the chatbots? characteristics and features, user backgrounds, communication models, relational building capacity, personalization, interaction, responses to suicidal thoughts, and users? in-app experiences during chatbot use. Through this analysis, we seek to identify gaps in the current research, guide future directions, and enhance the design of health-focused chatbots. Methods: Following the scoping review methodology by Arksey and O'Malley and guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist, this study used a two-pronged approach to identify relevant chatbots: (1) searching the iOS and Android App Stores and (2) reviewing scientific literature through a search strategy designed by a librarian. Overall, 36 chatbots were selected based on predefined criteria from both sources. These chatbots were systematically evaluated using a comprehensive framework developed for this study, including chatbot characteristics, user backgrounds, building relational capacity, personalization, interaction models, responses to critical situations, and user experiences. Ten coauthors were responsible for downloading and testing the chatbots, coding their features, and evaluating their performance in simulated conversations. The testing of all chatbot apps was limited to their free-to-use features. Results: This review provides an overview of the diversity of health-related chatbots, encompassing categories such as mental health support, physical activity promotion, and behavior change interventions. Chatbots use text, animations, speech, images, and emojis for communication. The findings highlight variations in conversational capabilities, including empathy, humor, and personalization. Notably, concerns regarding safety, particularly in addressing suicidal thoughts, were evident. Approximately 44% (16/36) of the chatbots effectively addressed suicidal thoughts. User experiences and behavioral outcomes demonstrated the potential of chatbots in health interventions, but evidence remains limited. Conclusions: This scoping review underscores the significance of chatbots in health-related applications and offers insights into their features, functionalities, and user experiences. This study contributes to advancing the understanding of chatbots? role in digital health interventions, thus paving the way for more effective and user-centric health promotion strategies. This study informs future research directions, emphasizing the need for rigorous randomized control trials, standardized evaluation metrics, and user-centered design to unlock the full potential of chatbots in enhancing health and well-being. Future research should focus on addressing limitations, exploring real-world user experiences, and implementing robust data security and privacy measures. UR - https://www.jmir.org/2023/1/e47217 UR - http://dx.doi.org/10.2196/47217 UR - http://www.ncbi.nlm.nih.gov/pubmed/38113097 ID - info:doi/10.2196/47217 ER - TY - JOUR AU - Wang, Guoyong AU - Gao, Kai AU - Liu, Qianyang AU - Wu, Yuxin AU - Zhang, Kaijun AU - Zhou, Wei AU - Guo, Chunbao PY - 2023/12/14 TI - Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information JO - J Med Internet Res SP - e49771 VL - 25 KW - ChatGPT 3.5 KW - ChatGPT 4.0 KW - artificial intelligence KW - AI KW - COVID-19 KW - pandemic KW - public health KW - information retrieval N2 - Background: The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has necessitated reliable and authoritative information for public guidance. The World Health Organization (WHO) has been a primary source of such information, disseminating it through a question and answer format on its official website. Concurrently, ChatGPT 3.5 and 4.0, a deep learning-based natural language generation system, has shown potential in generating diverse text types based on user input. Objective: This study evaluates the accuracy of COVID-19 information generated by ChatGPT 3.5 and 4.0, assessing its potential as a supplementary public information source during the pandemic. Methods: We extracted 487 COVID-19?related questions from the WHO?s official website and used ChatGPT 3.5 and 4.0 to generate corresponding answers. These generated answers were then compared against the official WHO responses for evaluation. Two clinical experts scored the generated answers on a scale of 0-5 across 4 dimensions?accuracy, comprehensiveness, relevance, and clarity?with higher scores indicating better performance in each dimension. The WHO responses served as the reference for this assessment. Additionally, we used the BERT (Bidirectional Encoder Representations from Transformers) model to generate similarity scores (0-1) between the generated and official answers, providing a dual validation mechanism. Results: The mean (SD) scores for ChatGPT 3.5?generated answers were 3.47 (0.725) for accuracy, 3.89 (0.719) for comprehensiveness, 4.09 (0.787) for relevance, and 3.49 (0.809) for clarity. For ChatGPT 4.0, the mean (SD) scores were 4.15 (0.780), 4.47 (0.641), 4.56 (0.600), and 4.09 (0.698), respectively. All differences were statistically significant (P<.001), with ChatGPT 4.0 outperforming ChatGPT 3.5. The BERT model verification showed mean (SD) similarity scores of 0.83 (0.07) for ChatGPT 3.5 and 0.85 (0.07) for ChatGPT 4.0 compared with the official WHO answers. Conclusions: ChatGPT 3.5 and 4.0 can generate accurate and relevant COVID-19 information to a certain extent. However, compared with official WHO responses, gaps and deficiencies exist. Thus, users of ChatGPT 3.5 and 4.0 should also reference other reliable information sources to mitigate potential misinformation risks. Notably, ChatGPT 4.0 outperformed ChatGPT 3.5 across all evaluated dimensions, a finding corroborated by BERT model validation. UR - https://www.jmir.org/2023/1/e49771 UR - http://dx.doi.org/10.2196/49771 UR - http://www.ncbi.nlm.nih.gov/pubmed/38096014 ID - info:doi/10.2196/49771 ER - TY - JOUR AU - Singh, Akanksha AU - Schooley, Benjamin AU - Patel, Nitin PY - 2023/12/14 TI - Effects of User-Reported Risk Factors and Follow-Up Care Activities on Satisfaction With a COVID-19 Chatbot: Cross-Sectional Study JO - JMIR Mhealth Uhealth SP - e43105 VL - 11 KW - patient engagement KW - chatbot KW - population health KW - health recommender systems KW - conversational recommender systems KW - design factors KW - COVID-19 N2 - Background: The COVID-19 pandemic influenced many to consider methods to reduce human contact and ease the burden placed on health care workers. Conversational agents or chatbots are a set of technologies that may aid with these challenges. They may provide useful interactions for users, potentially reducing the health care worker burden while increasing user satisfaction. Research aims to understand these potential impacts of chatbots and conversational recommender systems and their associated design features. Objective: The objective of this study was to evaluate user perceptions of the helpfulness of an artificial intelligence chatbot that was offered free to the public in response to COVID-19. The chatbot engaged patients and provided educational information and the opportunity to report symptoms, understand personal risks, and receive referrals for care. Methods: A cross-sectional study design was used to analyze 82,222 chats collected from patients in South Carolina seeking services from the Prisma Health system. Chi-square tests and multinomial logistic regression analyses were conducted to assess the relationship between reported risk factors and perceived chat helpfulness using chats started between April 24, 2020, and April 21, 2022. Results: A total of 82,222 chat series were started with at least one question or response on record; 53,805 symptom checker questions with at least one COVID-19?related activity series were completed, with 5191 individuals clicking further to receive a virtual video visit and 2215 clicking further to make an appointment with a local physician. Patients who were aged >65 years (P<.001), reported comorbidities (P<.001), had been in contact with a person with COVID-19 in the last 14 days (P<.001), and responded to symptom checker questions that placed them at a higher risk of COVID-19 (P<.001) were 1.8 times more likely to report the chat as helpful than those who reported lower risk factors. Users who engaged with the chatbot to conduct a series of activities were more likely to find the chat helpful (P<.001), including seeking COVID-19 information (3.97-4.07 times), in-person appointments (2.46-1.99 times), telehealth appointments with a nearby provider (2.48-1.9 times), or vaccination (2.9-3.85 times) compared with those who did not perform any of these activities. Conclusions: Chatbots that are designed to target high-risk user groups and provide relevant actionable items may be perceived as a helpful approach to early contact with the health system for assessing communicable disease symptoms and follow-up care options at home before virtual or in-person contact with health care providers. The results identified and validated significant design factors for conversational recommender systems, including triangulating a high-risk target user population and providing relevant actionable items for users to choose from as part of user engagement. UR - https://mhealth.jmir.org/2023/1/e43105 UR - http://dx.doi.org/10.2196/43105 UR - http://www.ncbi.nlm.nih.gov/pubmed/38096007 ID - info:doi/10.2196/43105 ER - TY - JOUR AU - O'Hagan, Ross AU - Poplausky, Dina AU - Young, N. Jade AU - Gulati, Nicholas AU - Levoska, Melissa AU - Ungar, Benjamin AU - Ungar, Jonathan PY - 2023/12/14 TI - The Accuracy and Appropriateness of ChatGPT Responses on Nonmelanoma Skin Cancer Information Using Zero-Shot Chain of Thought Prompting JO - JMIR Dermatol SP - e49889 VL - 6 KW - ChatGPT KW - artificial intelligence KW - large language models KW - nonmelanoma skin KW - skin cancer KW - cell carcinoma KW - chatbot KW - dermatology KW - dermatologist KW - epidermis KW - dermis KW - oncology KW - cancer UR - https://derma.jmir.org/2023/1/e49889 UR - http://dx.doi.org/10.2196/49889 UR - http://www.ncbi.nlm.nih.gov/pubmed/38096013 ID - info:doi/10.2196/49889 ER - TY - JOUR AU - Minian, Nadia AU - Mehra, Kamna AU - Earle, Mackenzie AU - Hafuth, Sowsan AU - Ting-A-Kee, Ryan AU - Rose, Jonathan AU - Veldhuizen, Scott AU - Zawertailo, Laurie AU - Ratto, Matt AU - Melamed, C. Osnat AU - Selby, Peter PY - 2023/12/11 TI - AI Conversational Agent to Improve Varenicline Adherence: Protocol for a Mixed Methods Feasibility Study JO - JMIR Res Protoc SP - e53556 VL - 12 KW - evaluation KW - health bot KW - medication adherence KW - smoking cessation KW - varenicline KW - artificial intelligence KW - AI N2 - Background: Varenicline is a pharmacological intervention for tobacco dependence that is safe and effective in facilitating smoking cessation. Enhanced adherence to varenicline augments the probability of prolonged smoking abstinence. However, research has shown that one-third of people who use varenicline are nonadherent by the second week. There is evidence showing that behavioral support helps with medication adherence. We have designed an artificial intelligence (AI) conversational agent or health bot, called ?ChatV,? based on evidence of what works as well as what varenicline is, that can provide these supports. ChatV is an evidence-based, patient- and health care provider?informed health bot to improve adherence to varenicline. ChatV has been programmed to provide medication reminders, answer questions about varenicline and smoking cessation, and track medication intake and the number of cigarettes. Objective: This study aims to explore the feasibility of the ChatV health bot, to examine if it is used as intended, and to determine the appropriateness of proceeding with a randomized controlled trial. Methods: We will conduct a mixed methods feasibility study where we will pilot-test ChatV with 40 participants. Participants will be provided with a standard 12-week varenicline regimen and access to ChatV. Passive data collection will include adoption measures (how often participants use the chatbot, what features they used, when did they use it, etc). In addition, participants will complete questionnaires (at 1, 4, 8, and 12 weeks) assessing self-reported smoking status and varenicline adherence, as well as questions regarding the acceptability, appropriateness, and usability of the chatbot, and participate in an interview assessing acceptability, appropriateness, fidelity, and adoption. We will use ?stop, amend, and go? progression criteria for pilot studies to decide if a randomized controlled trial is a reasonable next step and what modifications are required. A health equity lens will be adopted during participant recruitment and data analysis to understand and address the differences in uptake and use of this digital health solution among diverse sociodemographic groups. The taxonomy of implementation outcomes will be used to assess feasibility, that is, acceptability, appropriateness, fidelity, adoption, and usability. In addition, medication adherence and smoking cessation will be measured to assess the preliminary treatment effect. Interview data will be analyzed using the framework analysis method. Results: Participant enrollment for the study will begin in January 2024. Conclusions: By using predetermined progression criteria, the results of this preliminary study will inform the determination of whether to advance toward a larger randomized controlled trial to test the effectiveness of the health bot. Additionally, this study will explore the acceptability, appropriateness, fidelity, adoption, and usability of the health bot. These insights will be instrumental in refining the intervention and the health bot. Trial Registration: ClinicalTrials.gov NCT05997901; https://classic.clinicaltrials.gov/ct2/show/NCT05997901 International Registered Report Identifier (IRRID): PRR1-10.2196/53556 UR - https://www.researchprotocols.org/2023/1/e53556 UR - http://dx.doi.org/10.2196/53556 UR - http://www.ncbi.nlm.nih.gov/pubmed/38079201 ID - info:doi/10.2196/53556 ER - TY - JOUR AU - Mercado, José AU - Espinosa-Curiel, Edrein Ismael AU - Martínez-Miranda, Juan PY - 2023/12/8 TI - Embodied Conversational Agents Providing Motivational Interviewing to Improve Health-Related Behaviors: Scoping Review JO - J Med Internet Res SP - e52097 VL - 25 KW - embodied conversational agent KW - ECA KW - motivational interview KW - MI KW - health-related behaviors KW - virtual agents KW - mobile phone N2 - Background: Embodied conversational agents (ECAs) are advanced human-like interfaces that engage users in natural face-to-face conversations and interactions. These traits position ECAs as innovative tools for delivering interventions for promoting health-related behavior adoption. This includes motivational interviewing (MI), a therapeutic approach that combines brief interventions with motivational techniques to encourage the adoption of healthier behaviors. Objective: This study aims to identify the health issues addressed by ECAs delivering MI interventions, explore the key characteristics of these ECAs (eg, appearance, dialogue mechanism, emotional model), analyze the implementation of MI principles and techniques within ECAs, and examine the evaluation methods and primary outcomes of studies that use ECAs providing MI interventions. Methods: We conducted a scoping review following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) methodology. Our systematic search covered the PubMed, Scopus, IEEE Xplore, ACM Digital, and PsycINFO databases for papers published between January 2008 and December 2022. We included papers describing ECAs developed for delivering MI interventions targeting health-related behaviors and excluded articles that did not describe ECAs with human appearances and without the necessary evaluation or MI explanation. In a multistage process, 3 independent reviewers performed screening and data extraction, and the collected data were synthesized using a narrative approach. Results: The initial search identified 404 articles, of which 3.5% (n=14) were included in the review. ECAs primarily focused on reducing alcohol use (n=5, 36%), took on female representations (n=9, 64%), and gave limited consideration to user ethnicity (n=9, 64%). Most of them used rules-driven dialogue mechanisms (n=13, 93%), include emotional behavior to convey empathy (n=8, 57%) but without an automatic recognition of user emotions (n=12, 86%). Regarding MI implementation, of 14 studies, 3 (21%) covered all MI principles, 4 (29%) included all processes, and none covered all techniques. Most studies (8/14, 57%) conducted acceptability, usability, and user experience assessments, whereas a smaller proportion (4/14, 29%) used randomized controlled trials to evaluate behavior changes. Overall, the studies reported positive results regarding acceptability, usability, and user experience and showed promising outcomes in changes in attitudes, beliefs, motivation, and behavior. Conclusions: This study revealed significant advancements in the use of ECAs for delivering MI interventions aimed at promoting healthier behaviors over the past 15 years. However, this review emphasizes the need for a more in-depth exploration of ECA characteristics. In addition, there is a need for the enhanced integration of MI principles, processes, and techniques into ECAs. Although acceptability and usability have received considerable attention, there is a compelling argument for placing a stronger emphasis on assessing changes in attitudes, beliefs, motivation, and behavior. Consequently, inclusion of more randomized controlled trials is essential for comprehensive intervention evaluations. UR - https://www.jmir.org/2023/1/e52097 UR - http://dx.doi.org/10.2196/52097 UR - http://www.ncbi.nlm.nih.gov/pubmed/38064707 ID - info:doi/10.2196/52097 ER - TY - JOUR AU - Bragazzi, Luigi Nicola AU - Crapanzano, Andrea AU - Converti, Manlio AU - Zerbetto, Riccardo AU - Khamisy-Farah, Rola PY - 2023/12/6 TI - The Impact of Generative Conversational Artificial Intelligence on the Lesbian, Gay, Bisexual, Transgender, and Queer Community: Scoping Review JO - J Med Internet Res SP - e52091 VL - 25 KW - generative conversational artificial intelligence KW - chatbot KW - lesbian, gay, bisexual, transgender, and queer community KW - LGBTQ KW - scoping review KW - mobile phone N2 - Background: Despite recent significant strides toward acceptance, inclusion, and equality, members of the lesbian, gay, bisexual, transgender, and queer (LGBTQ) community still face alarming mental health disparities, being almost 3 times more likely to experience depression, anxiety, and suicidal thoughts than their heterosexual counterparts. These unique psychological challenges are due to discrimination, stigmatization, and identity-related struggles and can potentially benefit from generative conversational artificial intelligence (AI). As the latest advancement in AI, conversational agents and chatbots can imitate human conversation and support mental health, fostering diversity and inclusivity, combating stigma, and countering discrimination. In contrast, if not properly designed, they can perpetuate exclusion and inequities. Objective: This study aims to examine the impact of generative conversational AI on the LGBTQ community. Methods: This study was designed as a scoping review. Four electronic scholarly databases (Scopus, Embase, Web of Science, and MEDLINE via PubMed) and gray literature (Google Scholar) were consulted from inception without any language restrictions. Original studies focusing on the LGBTQ community or counselors working with this community exposed to chatbots and AI-enhanced internet-based platforms and exploring the feasibility, acceptance, or effectiveness of AI-enhanced tools were deemed eligible. The findings were reported in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Results: Seven applications (HIVST-Chatbot, TelePrEP Navigator, Amanda Selfie, Crisis Contact Simulator, REALbot, Tough Talks, and Queer AI) were included and reviewed. The chatbots and internet-based assistants identified served various purposes: (1) to identify LGBTQ individuals at risk of suicide or contracting HIV or other sexually transmitted infections, (2) to provide resources to LGBTQ youth from underserved areas, (3) facilitate HIV status disclosure to sex partners, and (4) develop training role-play personas encompassing the diverse experiences and intersecting identities of LGBTQ youth to educate counselors. The use of generative conversational AI for the LGBTQ community is still in its early stages. Initial studies have found that deploying chatbots is feasible and well received, with high ratings for usability and user satisfaction. However, there is room for improvement in terms of the content provided and making conversations more engaging and interactive. Many of these studies used small sample sizes and short-term interventions measuring limited outcomes. Conclusions: Generative conversational AI holds promise, but further development and formal evaluation are needed, including studies with larger samples, longer interventions, and randomized trials to compare different content, delivery methods, and dissemination platforms. In addition, a focus on engagement with behavioral objectives is essential to advance this field. The findings have broad practical implications, highlighting that AI?s impact spans various aspects of people?s lives. Assessing AI?s impact on diverse communities and adopting diversity-aware and intersectional approaches can help shape AI?s positive impact on society as a whole. UR - https://www.jmir.org/2023/1/e52091 UR - http://dx.doi.org/10.2196/52091 UR - http://www.ncbi.nlm.nih.gov/pubmed/37864350 ID - info:doi/10.2196/52091 ER - TY - JOUR AU - Watari, Takashi AU - Takagi, Soshi AU - Sakaguchi, Kota AU - Nishizaki, Yuji AU - Shimizu, Taro AU - Yamamoto, Yu AU - Tokuda, Yasuharu PY - 2023/12/6 TI - Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study JO - JMIR Med Educ SP - e52202 VL - 9 KW - ChatGPT KW - artificial intelligence KW - medical education KW - clinical training KW - non-English language KW - ChatGPT-4 KW - Japan KW - Japanese KW - Asia KW - Asian KW - exam KW - examination KW - exams KW - examinations KW - NLP KW - natural language processing KW - LLM KW - language model KW - language models KW - performance KW - response KW - responses KW - answer KW - answers KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - reasoning KW - clinical KW - GM-ITE KW - self-assessment KW - residency programs N2 - Background: The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages. Objective: This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE). Methods: We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents? correct response rates. Results: Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P<.001). In terms of specific disciplines, GPT-4 scored 23.5 points higher in the ?specific diseases,? 30.9 points higher in ?obstetrics and gynecology,? and 26.1 points higher in ?internal medicine.? In contrast, GPT-4 scores in ?medical interviewing and professionalism,? ?general practice,? and ?psychiatry? were lower than those of the residents, although this discrepancy was not statistically significant. Upon analyzing scores based on question difficulty, GPT-4 scores were 17.2 points lower for easy problems (P=.007) but were 25.4 and 24.4 points higher for normal and difficult problems, respectively (P<.001). In year-on-year comparisons, GPT-4 scores were 21.7 and 21.5 points higher in the 2020 (P=.01) and 2022 (P=.003) examinations, respectively, but only 3.5 points higher in the 2021 examinations (no significant difference). Conclusions: In the Japanese language, GPT-4 also outperformed the average medical residents in the GM-ITE test, originally designed for them. Specifically, GPT-4 demonstrated a tendency to score higher on difficult questions with low resident correct response rates and those demanding a more comprehensive understanding of diseases. However, GPT-4 scored comparatively lower on questions that residents could readily answer, such as those testing attitudes toward patients and professionalism, as well as those necessitating an understanding of context and communication. These findings highlight the strengths and limitations of artificial intelligence applications in medical education and practice. UR - https://mededu.jmir.org/2023/1/e52202 UR - http://dx.doi.org/10.2196/52202 UR - http://www.ncbi.nlm.nih.gov/pubmed/38055323 ID - info:doi/10.2196/52202 ER - TY - JOUR AU - Loveys, Kate AU - Lloyd, Erica AU - Sagar, Mark AU - Broadbent, Elizabeth PY - 2023/12/5 TI - Development of a Virtual Human for Supporting Tobacco Cessation During the COVID-19 Pandemic JO - J Med Internet Res SP - e42310 VL - 25 KW - virtual human KW - conversational agent KW - tobacco cessation KW - eHealth KW - COVID-19 KW - public health KW - virtual health worker KW - smoking cessation KW - artificial intelligence KW - AI KW - chatbot KW - digital health intervention KW - web-based health KW - mobile phone UR - https://www.jmir.org/2023/1/e42310 UR - http://dx.doi.org/10.2196/42310 UR - http://www.ncbi.nlm.nih.gov/pubmed/38051571 ID - info:doi/10.2196/42310 ER - TY - JOUR AU - Thirunavukarasu, James Arun PY - 2023/12/5 TI - How Can the Clinical Aptitude of AI Assistants Be Assayed? JO - J Med Internet Res SP - e51603 VL - 25 KW - artificial intelligence KW - AI KW - validation KW - clinical decision aid KW - artificial general intelligence KW - foundation models KW - large language models KW - LLM KW - language model KW - ChatGPT KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - pitfall KW - pitfalls KW - pain point KW - pain points KW - implementation KW - barrier KW - barriers KW - challenge KW - challenges UR - https://www.jmir.org/2023/1/e51603 UR - http://dx.doi.org/10.2196/51603 UR - http://www.ncbi.nlm.nih.gov/pubmed/38051572 ID - info:doi/10.2196/51603 ER - TY - JOUR AU - Peven, Kimberly AU - Wickham, P. Aidan AU - Wilks, Octavia AU - Kaplan, C. Yusuf AU - Marhol, Andrei AU - Ahmed, Saddif AU - Bamford, Ryan AU - Cunningham, C. Adam AU - Prentice, Carley AU - Meczner, András AU - Fenech, Matthew AU - Gilbert, Stephen AU - Klepchukova, Anna AU - Ponzo, Sonia AU - Zhaunova, Liudmila PY - 2023/12/5 TI - Assessment of a Digital Symptom Checker Tool's Accuracy in Suggesting Reproductive Health Conditions: Clinical Vignettes Study JO - JMIR Mhealth Uhealth SP - e46718 VL - 11 KW - women's health KW - symptom checkers KW - symptom checker KW - digital health KW - chatbot KW - accuracy KW - eHealth apps KW - mobile phone KW - mobile health KW - mHealth KW - mobile health app KW - polycystic ovary syndrome KW - gynecology KW - digital health tool KW - endometriosis KW - uterus KW - uterine KW - uterine fibroids KW - vignettes KW - clinical vignettes N2 - Background: Reproductive health conditions such as endometriosis, uterine fibroids, and polycystic ovary syndrome (PCOS) affect a large proportion of women and people who menstruate worldwide. Prevalence estimates for these conditions range from 5% to 40% of women of reproductive age. Long diagnostic delays, up to 12 years, are common and contribute to health complications and increased health care costs. Symptom checker apps provide users with information and tools to better understand their symptoms and thus have the potential to reduce the time to diagnosis for reproductive health conditions. Objective: This study aimed to evaluate the agreement between clinicians and 3 symptom checkers (developed by Flo Health UK Limited) in assessing symptoms of endometriosis, uterine fibroids, and PCOS using vignettes. We also aimed to present a robust example of vignette case creation, review, and classification in the context of predeployment testing and validation of digital health symptom checker tools. Methods: Independent general practitioners were recruited to create clinical case vignettes of simulated users for the purpose of testing each condition symptom checker; vignettes created for each condition contained a mixture of condition-positive and condition-negative outcomes. A second panel of general practitioners then reviewed, approved, and modified (if necessary) each vignette. A third group of general practitioners reviewed each vignette case and designated a final classification. Vignettes were then entered into the symptom checkers by a fourth, different group of general practitioners. The outcomes of each symptom checker were then compared with the final classification of each vignette to produce accuracy metrics including percent agreement, sensitivity, specificity, positive predictive value, and negative predictive value. Results: A total of 24 cases were created per condition. Overall, exact matches between the vignette general practitioner classification and the symptom checker outcome were 83% (n=20) for endometriosis, 83% (n=20) for uterine fibroids, and 88% (n=21) for PCOS. For each symptom checker, sensitivity was reported as 81.8% for endometriosis, 84.6% for uterine fibroids, and 100% for PCOS; specificity was reported as 84.6% for endometriosis, 81.8% for uterine fibroids, and 75% for PCOS; positive predictive value was reported as 81.8% for endometriosis, 84.6% for uterine fibroids, 80% for PCOS; and negative predictive value was reported as 84.6% for endometriosis, 81.8% for uterine fibroids, and 100% for PCOS. Conclusions: The single-condition symptom checkers have high levels of agreement with general practitioner classification for endometriosis, uterine fibroids, and PCOS. Given long delays in diagnosis for many reproductive health conditions, which lead to increased medical costs and potential health complications for individuals and health care providers, innovative health apps and symptom checkers hold the potential to improve care pathways. UR - https://mhealth.jmir.org/2023/1/e46718 UR - http://dx.doi.org/10.2196/46718 UR - http://www.ncbi.nlm.nih.gov/pubmed/38051574 ID - info:doi/10.2196/46718 ER - TY - JOUR AU - Buhr, Raphael Christoph AU - Smith, Harry AU - Huppertz, Tilman AU - Bahr-Hamm, Katharina AU - Matthias, Christoph AU - Blaikie, Andrew AU - Kelsey, Tom AU - Kuhn, Sebastian AU - Eckrich, Jonas PY - 2023/12/5 TI - ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case?Based Questions JO - JMIR Med Educ SP - e49183 VL - 9 KW - large language models KW - LLMs KW - LLM KW - artificial intelligence KW - AI KW - ChatGPT KW - otorhinolaryngology KW - ORL KW - digital health KW - chatbots KW - global health KW - low- and middle-income countries KW - telemedicine KW - telehealth KW - language model KW - chatbot N2 - Background: Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more ?consultations? of LLMs about personal medical symptoms. Objective: This study aims to evaluate ChatGPT?s performance in answering clinical case?based questions in otorhinolaryngology (ORL) in comparison to ORL consultants? answers. Methods: We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Results: Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT?s scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT?s answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001). Conclusions: While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants? answers. LLMs have potential as augmentative tools for medical care, but their ?consultation? for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits. UR - https://mededu.jmir.org/2023/1/e49183 UR - http://dx.doi.org/10.2196/49183 UR - http://www.ncbi.nlm.nih.gov/pubmed/38051578 ID - info:doi/10.2196/49183 ER - TY - JOUR AU - Shimizu, Ikuo AU - Kasai, Hajime AU - Shikino, Kiyoshi AU - Araki, Nobuyuki AU - Takahashi, Zaiya AU - Onodera, Misaki AU - Kimura, Yasuhiko AU - Tsukamoto, Tomoko AU - Yamauchi, Kazuyo AU - Asahina, Mayumi AU - Ito, Shoichi AU - Kawakami, Eiryo PY - 2023/11/30 TI - Developing Medical Education Curriculum Reform Strategies to Address the Impact of Generative AI: Qualitative Study JO - JMIR Med Educ SP - e53466 VL - 9 KW - artificial intelligence KW - curriculum reform KW - generative artificial intelligence KW - large language models KW - medical education KW - qualitative analysis KW - strengths-weaknesses-opportunities-threats (SWOT) framework N2 - Background: Generative artificial intelligence (GAI), represented by large language models, have the potential to transform health care and medical education. In particular, GAI?s impact on higher education has the potential to change students? learning experience as well as faculty?s teaching. However, concerns have been raised about ethical consideration and decreased reliability of the existing examinations. Furthermore, in medical education, curriculum reform is required to adapt to the revolutionary changes brought about by the integration of GAI into medical practice and research. Objective: This study analyzes the impact of GAI on medical education curricula and explores strategies for adaptation. Methods: The study was conducted in the context of faculty development at a medical school in Japan. A workshop involving faculty and students was organized, and participants were divided into groups to address two research questions: (1) How does GAI affect undergraduate medical education curricula? and (2) How should medical school curricula be reformed to address the impact of GAI? The strength, weakness, opportunity, and threat (SWOT) framework was used, and cross-SWOT matrix analysis was used to devise strategies. Further, 4 researchers conducted content analysis on the data generated during the workshop discussions. Results: The data were collected from 8 groups comprising 55 participants. Further, 5 themes about the impact of GAI on medical education curricula emerged: improvement of teaching and learning, improved access to information, inhibition of existing learning processes, problems in GAI, and changes in physicians? professionality. Positive impacts included enhanced teaching and learning efficiency and improved access to information, whereas negative impacts included concerns about reduced independent thinking and the adaptability of existing assessment methods. Further, GAI was perceived to change the nature of physicians? expertise. Three themes emerged from the cross-SWOT analysis for curriculum reform: (1) learning about GAI, (2) learning with GAI, and (3) learning aside from GAI. Participants recommended incorporating GAI literacy, ethical considerations, and compliance into the curriculum. Learning with GAI involved improving learning efficiency, supporting information gathering and dissemination, and facilitating patient involvement. Learning aside from GAI emphasized maintaining GAI-free learning processes, fostering higher cognitive domains of learning, and introducing more communication exercises. Conclusions: This study highlights the profound impact of GAI on medical education curricula and provides insights into curriculum reform strategies. Participants recognized the need for GAI literacy, ethical education, and adaptive learning. Further, GAI was recognized as a tool that can enhance efficiency and involve patients in education. The study also suggests that medical education should focus on competencies that GAI hardly replaces, such as clinical experience and communication. Notably, involving both faculty and students in curriculum reform discussions fosters a sense of ownership and ensures broader perspectives are encompassed. UR - https://mededu.jmir.org/2023/1/e53466 UR - http://dx.doi.org/10.2196/53466 UR - http://www.ncbi.nlm.nih.gov/pubmed/38032695 ID - info:doi/10.2196/53466 ER - TY - JOUR AU - Spallek, Sophia AU - Birrell, Louise AU - Kershaw, Stephanie AU - Devine, Krogh Emma AU - Thornton, Louise PY - 2023/11/30 TI - Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms JO - JMIR Med Educ SP - e51243 VL - 9 KW - artificial intelligence KW - generative artificial intelligence KW - large language models KW - ChatGPT KW - medical education KW - health education KW - patient education handout KW - preventive health services KW - educational intervention KW - mental health KW - substance use N2 - Background: The use of generative artificial intelligence, more specifically large language models (LLMs), is proliferating, and as such, it is vital to consider both the value and potential harms of its use in medical education. Their efficiency in a variety of writing styles makes LLMs, such as ChatGPT, attractive for tailoring educational materials. However, this technology can feature biases and misinformation, which can be particularly harmful in medical education settings, such as mental health and substance use education. This viewpoint investigates if ChatGPT is sufficient for 2 common health education functions in the field of mental health and substance use: (1) answering users? direct queries and (2) aiding in the development of quality consumer educational health materials. Objective: This viewpoint includes a case study to provide insight into the accessibility, biases, and quality of ChatGPT?s query responses and educational health materials. We aim to provide guidance for the general public and health educators wishing to utilize LLMs. Methods: We collected real world queries from 2 large-scale mental health and substance use portals and engineered a variety of prompts to use on GPT-4 Pro with the Bing BETA internet browsing plug-in. The outputs were evaluated with tools from the Sydney Health Literacy Lab to determine the accessibility, the adherence to Mindframe communication guidelines to identify biases, and author assessments on quality, including tailoring to audiences, duty of care disclaimers, and evidence-based internet references. Results: GPT-4?s outputs had good face validity, but upon detailed analysis were substandard in comparison to expert-developed materials. Without engineered prompting, the reading level, adherence to communication guidelines, and use of evidence-based websites were poor. Therefore, all outputs still required cautious human editing and oversight. Conclusions: GPT-4 is currently not reliable enough for direct-consumer queries, but educators and researchers can use it for creating educational materials with caution. Materials created with LLMs should disclose the use of generative artificial intelligence and be evaluated on their efficacy with the target audience. UR - https://mededu.jmir.org/2023/1/e51243 UR - http://dx.doi.org/10.2196/51243 UR - http://www.ncbi.nlm.nih.gov/pubmed/38032714 ID - info:doi/10.2196/51243 ER - TY - JOUR AU - Li, Jingquan PY - 2023/11/28 TI - Security Implications of AI Chatbots in Health Care JO - J Med Internet Res SP - e47551 VL - 25 KW - security KW - privacy KW - chatbot KW - AI KW - artificial intelligence KW - health information KW - HIPAA KW - ChatGPT KW - computer program KW - natural language processing KW - tool KW - improvement KW - patient care KW - care KW - data security KW - guidelines KW - risk KW - policy UR - https://www.jmir.org/2023/1/e47551 UR - http://dx.doi.org/10.2196/47551 UR - http://www.ncbi.nlm.nih.gov/pubmed/38015597 ID - info:doi/10.2196/47551 ER - TY - JOUR AU - Veras, Mirella AU - Dyer, Joseph-Omer AU - Rooney, Morgan AU - Barros Silva, Goberlânio Paulo AU - Rutherford, Derek AU - Kairy, Dahlia PY - 2023/11/24 TI - Usability and Efficacy of Artificial Intelligence Chatbots (ChatGPT) for Health Sciences Students: Protocol for a Crossover Randomized Controlled Trial JO - JMIR Res Protoc SP - e51873 VL - 12 KW - artificial intelligence KW - AI KW - health sciences KW - usability KW - learning outcomes KW - perceptions KW - OpenAI KW - ChatGPT KW - education KW - randomized controlled trial KW - RCT KW - crossover RCT N2 - Background: The integration of artificial intelligence (AI) into health sciences students? education holds significant importance. The rapid advancement of AI has opened new horizons in scientific writing and has the potential to reshape human-technology interactions. AI in education may impact critical thinking, leading to unintended consequences that need to be addressed. Understanding the implications of AI adoption in education is essential for ensuring its responsible and effective use, empowering health sciences students to navigate AI-driven technologies? evolving field with essential knowledge and skills. Objective: This study aims to provide details on the study protocol and the methods used to investigate the usability and efficacy of ChatGPT, a large language model. The primary focus is on assessing its role as a supplementary learning tool for improving learning processes and outcomes among undergraduate health sciences students, with a specific emphasis on chronic diseases. Methods: This single-blinded, crossover, randomized, controlled trial is part of a broader mixed methods study, and the primary emphasis of this paper is on the quantitative component of the overall research. A total of 50 students will be recruited for this study. The alternative hypothesis posits that there will be a significant difference in learning outcomes and technology usability between students using ChatGPT (group A) and those using standard web-based tools (group B) to access resources and complete assignments. Participants will be allocated to sequence AB or BA in a 1:1 ratio using computer-generated randomization. Both arms include students? participation in a writing assignment intervention, with a washout period of 21 days between interventions. The primary outcome is the measure of the technology usability and effectiveness of ChatGPT, whereas the secondary outcome is the measure of students? perceptions and experiences with ChatGPT as a learning tool. Outcome data will be collected up to 24 hours after the interventions. Results: This study aims to understand the potential benefits and challenges of incorporating AI as an educational tool, particularly in the context of student learning. The findings are expected to identify critical areas that need attention and help educators develop a deeper understanding of AI?s impact on the educational field. By exploring the differences in the usability and efficacy between ChatGPT and conventional web-based tools, this study seeks to inform educators and students on the responsible integration of AI into academic settings, with a specific focus on health sciences education. Conclusions: By exploring the usability and efficacy of ChatGPT compared with conventional web-based tools, this study seeks to inform educators and students about the responsible integration of AI into academic settings. Trial Registration: ClinicalTrails.gov NCT05963802; https://clinicaltrials.gov/study/NCT05963802 International Registered Report Identifier (IRRID): PRR1-10.2196/51873 UR - https://www.researchprotocols.org/2023/1/e51873 UR - http://dx.doi.org/10.2196/51873 UR - http://www.ncbi.nlm.nih.gov/pubmed/37999958 ID - info:doi/10.2196/51873 ER - TY - JOUR AU - Wong, Shin-Yee Rebecca AU - Ming, Chiau Long AU - Raja Ali, Affendi Raja PY - 2023/11/21 TI - The Intersection of ChatGPT, Clinical Medicine, and Medical Education JO - JMIR Med Educ SP - e47274 VL - 9 KW - ChatGPT KW - clinical research KW - large language model KW - artificial intelligence KW - ethical considerations KW - AI KW - OpenAI UR - https://mededu.jmir.org/2023/1/e47274 UR - http://dx.doi.org/10.2196/47274 UR - http://www.ncbi.nlm.nih.gov/pubmed/37988149 ID - info:doi/10.2196/47274 ER - TY - JOUR AU - Ferreira, L. Alana AU - Chu, Brian AU - Grant-Kels, M. Jane AU - Ogunleye, Temitayo AU - Lipoff, B. Jules PY - 2023/11/17 TI - Evaluation of ChatGPT Dermatology Responses to Common Patient Queries JO - JMIR Dermatol SP - e49280 VL - 6 KW - ChatGPT KW - dermatology KW - dermatologist KW - artificial intelligence KW - AI KW - medical advice KW - GPT-4 KW - patient queries KW - information resource KW - response evaluation KW - skin condition KW - skin KW - tool KW - AI tool UR - https://derma.jmir.org/2023/1/e49280 UR - http://dx.doi.org/10.2196/49280 UR - http://www.ncbi.nlm.nih.gov/pubmed/37976093 ID - info:doi/10.2196/49280 ER - TY - JOUR AU - Cheng, L. Abby AU - Agarwal, Mansi AU - Armbrecht, A. Melissa AU - Abraham, Joanna AU - Calfee, P. Ryan AU - Goss, W. Charles PY - 2023/11/17 TI - Behavioral Mechanisms That Mediate Mental and Physical Health Improvements in People With Chronic Pain Who Receive a Digital Health Intervention: Prospective Cohort Pilot Study JO - JMIR Form Res SP - e51422 VL - 7 KW - digital mental health intervention KW - chronic musculoskeletal pain KW - anxiety KW - depression KW - pain interference KW - physical function KW - behavioral activation KW - pain acceptance KW - sleep quality KW - mediation analysis KW - behavioral mechanism KW - chronic pain KW - digital health intervention KW - mobile phone N2 - Background: Preliminary evidence suggests that digital mental health intervention (Wysa for Chronic Pain) can improve mental and physical health in people with chronic musculoskeletal pain and coexisting symptoms of depression or anxiety. However, the behavioral mechanisms through which this intervention acts are not fully understood. Objective: The purpose of this study was to identify behavioral mechanisms that may mediate changes in mental and physical health associated with use of Wysa for Chronic Pain during orthopedic management of chronic musculoskeletal pain. We hypothesized that improved behavioral activation, pain acceptance, and sleep quality mediate improvements in self-reported mental and physical health. Methods: In this prospective cohort, pilot mediation analysis, adults with chronic (?3 months) neck or back pain received the Wysa for Chronic Pain digital intervention, which uses a conversational agent and text-based access to human counselors to deliver cognitive behavioral therapy and related therapeutic content. Patient-reported outcomes and proposed mediators were collected at baseline and 1 month. The exposure of interest was participants? engagement (ie, total interactions) with the digital intervention. Proposed mediators were assessed using the Behavioral Activation for Depression Scale?Short Form, Chronic Pain Acceptance Questionnaire, and Athens Insomnia Scale. Outcomes included Patient-Reported Outcomes Measurement Information System Anxiety, Depression, Pain Interference, and Physical Function scores. A mediation analysis was conducted using the Baron and Kenny method, adjusting for age, sex, and baseline mediators and outcome values. P<.20 was considered significant for this pilot study. Results: Among 30 patients (mean age 59, SD 14, years; 21 [70%] female), the mediation effect of behavioral activation on the relationship between increased intervention engagement and improved anxiety symptoms met predefined statistical significance thresholds (indirect effect ?0.4, 80% CI ?0.7 to ?0.1; P=.13, 45% of the total effect). The direction of mediation effect was generally consistent with our hypothesis for all other proposed mediator or outcome relationships, as well. Conclusions: In a full-sized randomized controlled trial of patients with chronic musculoskeletal pain, behavioral activation, pain acceptance, and sleep quality may play an important role in mediating the relationship between use of a digital mental health intervention (Wysa for Chronic Pain) and improved mental and physical health. Trial Registration: ClinicalTrials.gov NCT05194722; https://clinicaltrials.gov/ct2/show/NCT05194722 UR - https://formative.jmir.org/2023/1/e51422 UR - http://dx.doi.org/10.2196/51422 UR - http://www.ncbi.nlm.nih.gov/pubmed/37976097 ID - info:doi/10.2196/51422 ER - TY - JOUR AU - Ettman, K. Catherine AU - Galea, Sandro PY - 2023/11/16 TI - The Potential Influence of AI on Population Mental Health JO - JMIR Ment Health SP - e49936 VL - 10 KW - mental health KW - artificial intelligence KW - AI KW - policy KW - policies KW - population health KW - population KW - ChatGPT KW - generative KW - tools KW - digital mental health UR - https://mental.jmir.org/2023/1/e49936 UR - http://dx.doi.org/10.2196/49936 UR - http://www.ncbi.nlm.nih.gov/pubmed/37971803 ID - info:doi/10.2196/49936 ER - TY - JOUR AU - Gödde, Daniel AU - Nöhl, Sophia AU - Wolf, Carina AU - Rupert, Yannick AU - Rimkus, Lukas AU - Ehlers, Jan AU - Breuckmann, Frank AU - Sellmann, Timur PY - 2023/11/16 TI - A SWOT (Strengths, Weaknesses, Opportunities, and Threats) Analysis of ChatGPT in the Medical Literature: Concise Review JO - J Med Internet Res SP - e49368 VL - 25 KW - ChatGPT KW - chatbot KW - artificial intelligence KW - education technology KW - medical education KW - machine learning KW - chatbots KW - concise review KW - review methods KW - review methodology KW - SWOT N2 - Background: ChatGPT is a 175-billion-parameter natural language processing model that is already involved in scientific content and publications. Its influence ranges from providing quick access to information on medical topics, assisting in generating medical and scientific articles and papers, performing medical data analyses, and even interpreting complex data sets. Objective: The future role of ChatGPT remains uncertain and a matter of debate already shortly after its release. This review aimed to analyze the role of ChatGPT in the medical literature during the first 3 months after its release. Methods: We performed a concise review of literature published in PubMed from December 1, 2022, to March 31, 2023. To find all publications related to ChatGPT or considering ChatGPT, the search term was kept simple (?ChatGPT? in AllFields). All publications available as full text in German or English were included. All accessible publications were evaluated according to specifications by the author team (eg, impact factor, publication modus, article type, publication speed, and type of ChatGPT integration or content). The conclusions of the articles were used for later SWOT (strengths, weaknesses, opportunities, and threats) analysis. All data were analyzed on a descriptive basis. Results: Of 178 studies in total, 160 met the inclusion criteria and were evaluated. The average impact factor was 4.423 (range 0-96.216), and the average publication speed was 16 (range 0-83) days. Among the articles, there were 77 editorials (48,1%), 43 essays (26.9%), 21 studies (13.1%), 6 reviews (3.8%), 6 case reports (3.8%), 6 news (3.8%), and 1 meta-analysis (0.6%). Of those, 54.4% (n=87) were published as open access, with 5% (n=8) provided on preprint servers. Over 400 quotes with information on strengths, weaknesses, opportunities, and threats were detected. By far, most (n=142, 34.8%) were related to weaknesses. ChatGPT excels in its ability to express ideas clearly and formulate general contexts comprehensibly. It performs so well that even experts in the field have difficulty identifying abstracts generated by ChatGPT. However, the time-limited scope and the need for corrections by experts were mentioned as weaknesses and threats of ChatGPT. Opportunities include assistance in formulating medical issues for nonnative English speakers, as well as the possibility of timely participation in the development of such artificial intelligence tools since it is in its early stages and can therefore still be influenced. Conclusions: Artificial intelligence tools such as ChatGPT are already part of the medical publishing landscape. Despite their apparent opportunities, policies and guidelines must be implemented to ensure benefits in education, clinical practice, and research and protect against threats such as scientific misconduct, plagiarism, and inaccuracy. UR - https://www.jmir.org/2023/1/e49368 UR - http://dx.doi.org/10.2196/49368 UR - http://www.ncbi.nlm.nih.gov/pubmed/37865883 ID - info:doi/10.2196/49368 ER - TY - JOUR AU - Lakdawala, Nehal AU - Channa, Leelakrishna AU - Gronbeck, Christian AU - Lakdawala, Nikita AU - Weston, Gillian AU - Sloan, Brett AU - Feng, Hao PY - 2023/11/14 TI - Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris JO - JMIR Dermatol SP - e50409 VL - 6 KW - ChatGPT KW - artificial intelligence KW - dermatology KW - clinical guidance KW - counseling KW - atopic dermatitis KW - acne vulgaris KW - skin KW - acne KW - dermatitis KW - NLP KW - natural language processing KW - dermatologic KW - dermatological KW - recommendation KW - recommendations KW - guidance KW - advise KW - counsel KW - response KW - responses KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - answer KW - answers KW - computer generated KW - automated UR - https://derma.jmir.org/2023/1/e50409 UR - http://dx.doi.org/10.2196/50409 UR - http://www.ncbi.nlm.nih.gov/pubmed/37962920 ID - info:doi/10.2196/50409 ER - TY - JOUR AU - Scherr, Riley AU - Halaseh, F. Faris AU - Spina, Aidin AU - Andalib, Saman AU - Rivera, Ronald PY - 2023/11/10 TI - ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study JO - JMIR Med Educ SP - e49877 VL - 9 KW - ChatGPT KW - medical school simulations KW - preclinical curriculum KW - artificial intelligence KW - AI KW - AI in medical education KW - medical education KW - simulation KW - generative KW - curriculum KW - clinical education KW - simulations N2 - Background: The transition to clinical clerkships can be difficult for medical students, as it requires the synthesis and application of preclinical information into diagnostic and therapeutic decisions. ChatGPT?a generative language model with many medical applications due to its creativity, memory, and accuracy?can help students in this transition. Objective: This paper models ChatGPT 3.5?s ability to perform interactive clinical simulations and shows this tool?s benefit to medical education. Methods: Simulation starting prompts were refined using ChatGPT 3.5 in Google Chrome. Starting prompts were selected based on assessment format, stepwise progression of simulation events and questions, free-response question type, responsiveness to user inputs, postscenario feedback, and medical accuracy of the feedback. The chosen scenarios were advanced cardiac life support and medical intensive care (for sepsis and pneumonia). Results: Two starting prompts were chosen. Prompt 1 was developed through 3 test simulations and used successfully in 2 simulations. Prompt 2 was developed through 10 additional test simulations and used successfully in 1 simulation. Conclusions: ChatGPT is capable of creating simulations for early clinical education. These simulations let students practice novel parts of the clinical curriculum, such as forming independent diagnostic and therapeutic impressions over an entire patient encounter. Furthermore, the simulations can adapt to user inputs in a way that replicates real life more accurately than premade question bank clinical vignettes. Finally, ChatGPT can create potentially unlimited free simulations with specific feedback, which increases access for medical students with lower socioeconomic status and underresourced medical schools. However, no tool is perfect, and ChatGPT is no exception; there are concerns about simulation accuracy and replicability that need to be addressed to further optimize ChatGPT?s performance as an educational resource. UR - https://mededu.jmir.org/2023/1/e49877 UR - http://dx.doi.org/10.2196/49877 UR - http://www.ncbi.nlm.nih.gov/pubmed/37948112 ID - info:doi/10.2196/49877 ER - TY - JOUR AU - Gomaa, Sameh AU - Posey, James AU - Bashir, Babar AU - Basu Mallick, Atrayee AU - Vanderklok, Eleanor AU - Schnoll, Max AU - Zhan, Tingting AU - Wen, Kuang-Yi PY - 2023/11/10 TI - Feasibility of a Text Messaging?Integrated and Chatbot-Interfaced Self-Management Program for Symptom Control in Patients With Gastrointestinal Cancer Undergoing Chemotherapy: Pilot Mixed Methods Study JO - JMIR Form Res SP - e46128 VL - 7 KW - chemotherapy KW - gastrointestinal cancer KW - digital health KW - text messaging KW - chatbot KW - side effect management N2 - Background: Outpatient chemotherapy often leaves patients to grapple with a range of complex side effects at home. Leveraging tailored evidence-based content to monitor and manage these symptoms remains an untapped potential among patients with gastrointestinal (GI) cancer. Objective: This study aims to bridge the gap in outpatient chemotherapy care by integrating a cutting-edge text messaging system with a chatbot interface. This approach seeks to enable real-time monitoring and proactive management of side effects in patients with GI cancer undergoing intravenous chemotherapy. Methods: Real-Time Chemotherapy-Associated Side Effects Monitoring Supportive System (RT-CAMSS) was developed iteratively, incorporating patient-centered inputs and evidence-based information. It synthesizes chemotherapy knowledge, self-care symptom management skills, emotional support, and healthy lifestyle recommendations. In a single-arm 2-month pilot study, patients with GI cancer undergoing chemotherapy received tailored intervention messages thrice a week and a weekly Patient-Reported Outcomes Version of the Common Terminology Criteria for Adverse Events?based symptom assessment via a chatbot interface. Baseline and postintervention patient surveys and interviews were conducted. Results: Out of 45 eligible patients, 34 were enrolled (76% consent rate). The mean age was 61 (SD 12) years, with 19 (56%) being females and 21 (62%) non-Hispanic White. The most common cancer type was pancreatic (n=18, 53%), followed by colon (n=12, 35%) and stomach (n=4, 12%). In total, 27 (79% retention rate) participants completed the postintervention follow-up. In total, 20 patients texted back at least once to seek additional information, with the keyword ?chemo? or ?support? texted the most. Among those who used the chatbot system checker, fatigue emerged as the most frequently reported symptom (n=15), followed by neuropathy (n=7). Adjusted for multiple comparisons, patients engaging with the platform exhibited significantly improved Patient Activation Measure (3.70, 95% CI ?6.919 to ?0.499; P=.02). Postintervention interviews and satisfaction surveys revealed that participants found the intervention was user-friendly and were provided with valuable information. Conclusions: Capitalizing on mobile technology communication holds tremendous scalability for enhancing health care services. This study presents initial evidence of the engagement and acceptability of RT-CAMSS, warranting further evaluation in a controlled clinical trial setting. UR - https://formative.jmir.org/2023/1/e46128 UR - http://dx.doi.org/10.2196/46128 UR - http://www.ncbi.nlm.nih.gov/pubmed/37948108 ID - info:doi/10.2196/46128 ER - TY - JOUR AU - Abuyaman, Omar PY - 2023/11/10 TI - Strengths and Weaknesses of ChatGPT Models for Scientific Writing About Medical Vitamin B12: Mixed Methods Study JO - JMIR Form Res SP - e49459 VL - 7 KW - AI KW - ChatGPT KW - GPT-4 KW - GPT-3.5 KW - vitamin B12 KW - artificial intelligence KW - language editing KW - wide range information KW - AI solutions KW - scientific content N2 - Background: ChatGPT is a large language model developed by OpenAI designed to generate human-like responses to prompts. Objective: This study aims to evaluate the ability of GPT-4 to generate scientific content and assist in scientific writing using medical vitamin B12 as the topic. Furthermore, the study will compare the performance of GPT-4 to its predecessor, GPT-3.5. Methods: The study examined responses from GPT-4 and GPT-3.5 to vitamin B12?related prompts, focusing on their quality and characteristics and comparing them to established scientific literature. Results: The results indicated that GPT-4 can potentially streamline scientific writing through its ability to edit language and write abstracts, keywords, and abbreviation lists. However, significant limitations of ChatGPT were revealed, including its inability to identify and address bias, inability to include recent information, lack of transparency, and inclusion of inaccurate information. Additionally, it cannot check for plagiarism or provide proper references. The accuracy of GPT-4?s answers was found to be superior to GPT-3.5. Conclusions: ChatGPT can be considered a helpful assistant in the writing process but not a replacement for a scientist?s expertise. Researchers must remain aware of its limitations and use it appropriately. The improvements in consecutive ChatGPT versions suggest the possibility of overcoming some present limitations in the near future. UR - https://formative.jmir.org/2023/1/e49459 UR - http://dx.doi.org/10.2196/49459 UR - http://www.ncbi.nlm.nih.gov/pubmed/37948100 ID - info:doi/10.2196/49459 ER - TY - JOUR AU - Sun, Haonan AU - Zhang, Kai AU - Lan, Wei AU - Gu, Qiufeng AU - Jiang, Guangxiang AU - Yang, Xue AU - Qin, Wanli AU - Han, Dongran PY - 2023/11/9 TI - An AI Dietitian for Type 2 Diabetes Mellitus Management Based on Large Language and Image Recognition Models: Preclinical Concept Validation Study JO - J Med Internet Res SP - e51300 VL - 25 KW - ChatGPT KW - artificial intelligence KW - AI KW - diabetes KW - diabetic KW - nutrition KW - nutritional KW - diet KW - dietary KW - dietician KW - medical nutrition therapy KW - ingredient recognition KW - digital health KW - language model KW - image recognition KW - machine learning KW - deep learning KW - NLP KW - natural language processing KW - meal KW - recommendation KW - meals KW - food KW - GPT 4.0 N2 - Background: Nutritional management for patients with diabetes in China is a significant challenge due to the low supply of registered clinical dietitians. To address this, an artificial intelligence (AI)?based nutritionist program that uses advanced language and image recognition models was created. This program can identify ingredients from images of a patient?s meal and offer nutritional guidance and dietary recommendations. Objective: The primary objective of this study is to evaluate the competence of the models that support this program. Methods: The potential of an AI nutritionist program for patients with type 2 diabetes mellitus (T2DM) was evaluated through a multistep process. First, a survey was conducted among patients with T2DM and endocrinologists to identify knowledge gaps in dietary practices. ChatGPT and GPT 4.0 were then tested through the Chinese Registered Dietitian Examination to assess their proficiency in providing evidence-based dietary advice. ChatGPT?s responses to common questions about medical nutrition therapy were compared with expert responses by professional dietitians to evaluate its proficiency. The model?s food recommendations were scrutinized for consistency with expert advice. A deep learning?based image recognition model was developed for food identification at the ingredient level, and its performance was compared with existing models. Finally, a user-friendly app was developed, integrating the capabilities of language and image recognition models to potentially improve care for patients with T2DM. Results: Most patients (182/206, 88.4%) demanded more immediate and comprehensive nutritional management and education. Both ChatGPT and GPT 4.0 passed the Chinese Registered Dietitian examination. ChatGPT?s food recommendations were mainly in line with best practices, except for certain foods like root vegetables and dry beans. Professional dietitians? reviews of ChatGPT?s responses to common questions were largely positive, with 162 out of 168 providing favorable reviews. The multilabel image recognition model evaluation showed that the Dino V2 model achieved an average F1 score of 0.825, indicating high accuracy in recognizing ingredients. Conclusions: The model evaluations were promising. The AI-based nutritionist program is now ready for a supervised pilot study. UR - https://www.jmir.org/2023/1/e51300 UR - http://dx.doi.org/10.2196/51300 UR - http://www.ncbi.nlm.nih.gov/pubmed/37943581 ID - info:doi/10.2196/51300 ER - TY - JOUR AU - Beyeler, Marina AU - Légeret, Corinne AU - Kiwitz, Fabian AU - van der Horst, Klazine PY - 2023/11/8 TI - Usability and Overall Perception of a Health Bot for Nutrition-Related Questions for Patients Receiving Bariatric Care: Mixed Methods Study JO - JMIR Hum Factors SP - e47913 VL - 10 KW - bariatric surgery KW - nutrition information KW - usability KW - satisfaction KW - artificial intelligence KW - health bot KW - mobile phone N2 - Background: Currently, over 4000 bariatric procedures are performed annually in Switzerland. To improve outcomes, patients need to have good knowledge regarding postoperative nutrition. To potentially provide them with knowledge between dietetic consultations, a health bot (HB) was created. The HB can answer bariatric nutrition questions in writing based on artificial intelligence. Objective: This study aims to evaluate the usability and perception of the HB among patients receiving bariatric care. Methods: Patients before or after bariatric surgery tested the HB. A mixed methods approach was used, which consisted of a questionnaire and qualitative interviews before and after testing the HB. The dimensions usability of, usefulness of, satisfaction with, and ease of use of the HB, among others, were measured. Data were analyzed using R Studio (R Studio Inc) and Excel (Microsoft Corp). The interviews were transcribed and a summary inductive content analysis was performed. Results: A total of 12 patients (female: n=8, 67%; male: n=4, 33%) were included. The results showed excellent usability with a mean usability score of 87 (SD 12.5; range 57.5-100) out of 100. Other dimensions of acceptability included usefulness (mean 5.28, SD 2.02 out of 7), satisfaction (mean 5.75, SD 1.68 out of 7), and learnability (mean 6.26, SD 1.5 out of 7). The concept of the HB and availability of reliable nutrition information were perceived as desirable (mean 5.5, SD 1.64 out of 7). Weaknesses were identified in the response accuracy, limited knowledge, and design of the HB. Conclusions: The HB?s ease of use and usability were evaluated to be positive; response accuracy, topic selection, and design should be optimized in a next step. The perceptions of nutrition professionals and the impact on patient care and the nutrition knowledge of participants need to be examined in further studies. UR - https://humanfactors.jmir.org/2023/1/e47913 UR - http://dx.doi.org/10.2196/47913 UR - http://www.ncbi.nlm.nih.gov/pubmed/37938894 ID - info:doi/10.2196/47913 ER - TY - JOUR AU - Lv, Nan AU - Kannampallil, Thomas AU - Xiao, Lan AU - Ronneberg, R. Corina AU - Kumar, Vikas AU - Wittels, E. Nancy AU - Ajilore, A. Olusola AU - Smyth, M. Joshua AU - Ma, Jun PY - 2023/11/6 TI - Association Between User Interaction and Treatment Response of a Voice-Based Coach for Treating Depression and Anxiety: Secondary Analysis of a Pilot Randomized Controlled Trial JO - JMIR Hum Factors SP - e49715 VL - 10 KW - user interaction KW - treatment alliance KW - treatment response KW - voice assistant KW - depression KW - anxiety N2 - Background: The quality of user interaction with therapeutic tools has been positively associated with treatment response; however, no studies have investigated these relationships for voice-based digital tools. Objective: This study evaluated the relationships between objective and subjective user interaction measures as well as treatment response on Lumen, a novel voice-based coach, delivering problem-solving treatment to patients with mild to moderate depression or anxiety or both. Methods: In a pilot trial, 42 adults with clinically significant depression (Patient Health Questionnaire-9 [PHQ-9]) or anxiety (7-item Generalized Anxiety Disorder Scale [GAD-7]) symptoms or both received Lumen, a voice-based coach delivering 8 problem-solving treatment sessions. Objective (number of conversational breakdowns, ie, instances where a participant?s voice input could not be interpreted by Lumen) and subjective user interaction measures (task-related workload, user experience, and treatment alliance) were obtained for each session. Changes in PHQ-9 and GAD-7 scores at each ensuing session after session 1 measured the treatment response. Results: Participants were 38.9 (SD 12.9) years old, 28 (67%) were women, 8 (19%) were Black, 12 (29%) were Latino, 5 (12%) were Asian, and 28 (67%) had a high school or college education. Mean (SD) across sessions showed breakdowns (mean 6.5, SD 4.4 to mean 2.3, SD 1.8) decreasing over sessions, favorable task-related workload (mean 14.5, SD 5.6 to mean 17.6, SD 5.6) decreasing over sessions, neutral-to-positive user experience (mean 0.5, SD 1.4 to mean 1.1, SD 1.3), and high treatment alliance (mean 5.0, SD 1.4 to mean 5.3, SD 0.9). PHQ-9 (Ptrend=.001) and GAD-7 scores (Ptrend=.01) improved significantly over sessions. Treatment alliance correlated with improvements in PHQ-9 (Pearson r=?0.02 to ?0.46) and GAD-7 (r=0.03 to ?0.57) scores across sessions, whereas breakdowns and task-related workload did not. Mixed models showed that participants with higher individual mean treatment alliance had greater improvements in PHQ-9 (?=?1.13, 95% CI ?2.16 to ?0.10) and GAD-7 (?=?1.17, 95% CI ?2.13 to ?0.20) scores. Conclusions: The participants had fewer conversational breakdowns and largely favorable user interactions with Lumen across sessions. Conversational breakdowns were not associated with subjective user interaction measures or treatment responses, highlighting how participants adapted and effectively used Lumen. Individuals experiencing higher treatment alliance had greater improvements in depression and anxiety. Understanding treatment alliance can provide insights on improving treatment response for this new delivery modality, which provides accessibility, flexibility, comfort with disclosure, and cost-related advantages compared to conventional psychotherapy. Trial Registration: ClinicalTrials.gov NCT04524104; https://clinicaltrials.gov/study/NCT04524104 UR - https://humanfactors.jmir.org/2023/1/e49715 UR - http://dx.doi.org/10.2196/49715 UR - http://www.ncbi.nlm.nih.gov/pubmed/37930781 ID - info:doi/10.2196/49715 ER - TY - JOUR AU - Ito, Naoki AU - Kadomatsu, Sakina AU - Fujisawa, Mineto AU - Fukaguchi, Kiyomitsu AU - Ishizawa, Ryo AU - Kanda, Naoki AU - Kasugai, Daisuke AU - Nakajima, Mikio AU - Goto, Tadahiro AU - Tsugawa, Yusuke PY - 2023/11/2 TI - The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study JO - JMIR Med Educ SP - e47532 VL - 9 KW - GPT-4 KW - racial and ethnic bias KW - typical clinical vignettes KW - diagnosis KW - triage KW - artificial intelligence KW - AI KW - race KW - clinical vignettes KW - physician KW - efficiency KW - decision-making KW - bias KW - GPT N2 - Background: Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. Objective: We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. Methods: We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as ?correct? or ?incorrect.? Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. Results: The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients? race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. Conclusions: GPT-4?s ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage. UR - https://mededu.jmir.org/2023/1/e47532 UR - http://dx.doi.org/10.2196/47532 UR - http://www.ncbi.nlm.nih.gov/pubmed/37917120 ID - info:doi/10.2196/47532 ER - TY - JOUR AU - Martinengo, Laura AU - Lin, Xiaowen AU - Jabir, Ishqi Ahmad AU - Kowatsch, Tobias AU - Atun, Rifat AU - Car, Josip AU - Tudor Car, Lorainne PY - 2023/11/1 TI - Conversational Agents in Health Care: Expert Interviews to Inform the Definition, Classification, and Conceptual Framework JO - J Med Internet Res SP - e50767 VL - 25 KW - conceptual framework KW - conversational agent KW - chatbot KW - mobile health KW - mHealth KW - digital health KW - expert interview KW - mobile phone N2 - Background: Conversational agents (CAs), or chatbots, are computer programs that simulate conversations with humans. The use of CAs in health care settings is recent and rapidly increasing, which often translates to poor reporting of the CA development and evaluation processes and unreliable research findings. We developed and published a conceptual framework, designing, developing, evaluating, and implementing a smartphone-delivered, rule-based conversational agent (DISCOVER), consisting of 3 iterative stages of CA design, development, and evaluation and implementation, complemented by 2 cross-cutting themes (user-centered design and data privacy and security). Objective: This study aims to perform in-depth, semistructured interviews with multidisciplinary experts in health care CAs to share their views on the definition and classification of health care CAs and evaluate and validate the DISCOVER conceptual framework. Methods: We conducted one-on-one semistructured interviews via Zoom (Zoom Video Communications) with 12 multidisciplinary CA experts using an interview guide based on our framework. The interviews were audio recorded, transcribed by the research team, and analyzed using thematic analysis. Results: Following participants? input, we defined CAs as digital interfaces that use natural language to engage in a synchronous dialogue using ?1 communication modality, such as text, voice, images, or video. CAs were classified by 13 categories: response generation method, input and output modalities, CA purpose, deployment platform, CA development modality, appearance, length of interaction, type of CA-user interaction, dialogue initiation, communication style, CA personality, human support, and type of health care intervention. Experts considered that the conceptual framework could be adapted for artificial intelligence?based CAs. However, despite recent advances in artificial intelligence, including large language models, the technology is not able to ensure safety and reliability in health care settings. Finally, aligned with participants? feedback, we present an updated iteration of the conceptual framework for health care conversational agents (CHAT) with key considerations for CA design, development, and evaluation and implementation, complemented by 3 cross-cutting themes: ethics, user involvement, and data privacy and security. Conclusions: We present an expanded, validated CHAT and aim at guiding researchers from a variety of backgrounds and with different levels of expertise in the design, development, and evaluation and implementation of rule-based CAs in health care settings. UR - https://www.jmir.org/2023/1/e50767 UR - http://dx.doi.org/10.2196/50767 UR - http://www.ncbi.nlm.nih.gov/pubmed/37910153 ID - info:doi/10.2196/50767 ER - TY - JOUR AU - Baglivo, Francesco AU - De Angelis, Luigi AU - Casigliani, Virginia AU - Arzilli, Guglielmo AU - Privitera, Pierpaolo Gaetano AU - Rizzo, Caterina PY - 2023/11/1 TI - Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study JO - JMIR Med Educ SP - e51421 VL - 9 KW - artificial intelligence KW - chatbots KW - medical education KW - vaccination KW - public health KW - medical students KW - large language model KW - generative AI KW - ChatGPT KW - Google Bard KW - AI chatbot KW - health education KW - health care KW - medical training KW - educational support tool KW - chatbot model N2 - Background: Artificial intelligence (AI) is a rapidly developing field with the potential to transform various aspects of health care and public health, including medical training. During the ?Hygiene and Public Health? course for fifth-year medical students, a practical training session was conducted on vaccination using AI chatbots as an educational supportive tool. Before receiving specific training on vaccination, the students were given a web-based test extracted from the Italian National Medical Residency Test. After completing the test, a critical correction of each question was performed assisted by AI chatbots. Objective: The main aim of this study was to identify whether AI chatbots can be considered educational support tools for training in public health. The secondary objective was to assess the performance of different AI chatbots on complex multiple-choice medical questions in the Italian language. Methods: A test composed of 15 multiple-choice questions on vaccination was extracted from the Italian National Medical Residency Test using targeted keywords and administered to medical students via Google Forms and to different AI chatbot models (Bing Chat, ChatGPT, Chatsonic, Google Bard, and YouChat). The correction of the test was conducted in the classroom, focusing on the critical evaluation of the explanations provided by the chatbot. A Mann-Whitney U test was conducted to compare the performances of medical students and AI chatbots. Student feedback was collected anonymously at the end of the training experience. Results: In total, 36 medical students and 5 AI chatbot models completed the test. The students achieved an average score of 8.22 (SD 2.65) out of 15, while the AI chatbots scored an average of 12.22 (SD 2.77). The results indicated a statistically significant difference in performance between the 2 groups (U=49.5, P<.001), with a large effect size (r=0.69). When divided by question type (direct, scenario-based, and negative), significant differences were observed in direct (P<.001) and scenario-based (P<.001) questions, but not in negative questions (P=.48). The students reported a high level of satisfaction (7.9/10) with the educational experience, expressing a strong desire to repeat the experience (7.6/10). Conclusions: This study demonstrated the efficacy of AI chatbots in answering complex medical questions related to vaccination and providing valuable educational support. Their performance significantly surpassed that of medical students in direct and scenario-based questions. The responsible and critical use of AI chatbots can enhance medical education, making it an essential aspect to integrate into the educational system. UR - https://mededu.jmir.org/2023/1/e51421 UR - http://dx.doi.org/10.2196/51421 UR - http://www.ncbi.nlm.nih.gov/pubmed/37910155 ID - info:doi/10.2196/51421 ER - TY - JOUR AU - Wilhelm, Isabelle Theresa AU - Roos, Jonas AU - Kaczmarczyk, Robert PY - 2023/10/30 TI - Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study JO - J Med Internet Res SP - e49324 VL - 25 KW - dermatology KW - ophthalmology KW - orthopedics KW - therapy KW - large language models KW - artificial intelligence KW - LLM KW - ChatGPT KW - chatbot KW - chatbots KW - orthopedic KW - recommendation KW - recommendations KW - medical information KW - health information KW - quality KW - reliability KW - accuracy KW - safety KW - reliable KW - medical advice N2 - Background: As advancements in artificial intelligence (AI) continue, large language models (LLMs) have emerged as promising tools for generating medical information. Their rapid adaptation and potential benefits in health care require rigorous assessment in terms of the quality, accuracy, and safety of the generated information across diverse medical specialties. Objective: This study aimed to evaluate the performance of 4 prominent LLMs, namely, Claude-instant-v1.0, GPT-3.5-Turbo, Command-xlarge-nightly, and Bloomz, in generating medical content spanning the clinical specialties of ophthalmology, orthopedics, and dermatology. Methods: Three domain-specific physicians evaluated the AI-generated therapeutic recommendations for a diverse set of 60 diseases. The evaluation criteria involved the mDISCERN score, correctness, and potential harmfulness of the recommendations. ANOVA and pairwise t tests were used to explore discrepancies in content quality and safety across models and specialties. Additionally, using the capabilities of OpenAI?s most advanced model, GPT-4, an automated evaluation of each model?s responses to the diseases was performed using the same criteria and compared to the physicians? assessments through Pearson correlation analysis. Results: Claude-instant-v1.0 emerged with the highest mean mDISCERN score (3.35, 95% CI 3.23-3.46). In contrast, Bloomz lagged with the lowest score (1.07, 95% CI 1.03-1.10). Our analysis revealed significant differences among the models in terms of quality (P<.001). Evaluating their reliability, the models displayed strong contrasts in their falseness ratings, with variations both across models (P<.001) and specialties (P<.001). Distinct error patterns emerged, such as confusing diagnoses; providing vague, ambiguous advice; or omitting critical treatments, such as antibiotics for infectious diseases. Regarding potential harm, GPT-3.5-Turbo was found to be the safest, with the lowest harmfulness rating. All models lagged in detailing the risks associated with treatment procedures, explaining the effects of therapies on quality of life, and offering additional sources of information. Pearson correlation analysis underscored a substantial alignment between physician assessments and GPT-4?s evaluations across all established criteria (P<.01). Conclusions: This study, while comprehensive, was limited by the involvement of a select number of specialties and physician evaluators. The straightforward prompting strategy (?How to treat??) and the assessment benchmarks, initially conceptualized for human-authored content, might have potential gaps in capturing the nuances of AI-driven information. The LLMs evaluated showed a notable capability in generating valuable medical content; however, evident lapses in content quality and potential harm signal the need for further refinements. Given the dynamic landscape of LLMs, this study?s findings emphasize the need for regular and methodical assessments, oversight, and fine-tuning of these AI tools to ensure they produce consistently trustworthy and clinically safe medical advice. Notably, the introduction of an auto-evaluation mechanism using GPT-4, as detailed in this study, provides a scalable, transferable method for domain-agnostic evaluations, extending beyond therapy recommendation assessments. UR - https://www.jmir.org/2023/1/e49324 UR - http://dx.doi.org/10.2196/49324 UR - http://www.ncbi.nlm.nih.gov/pubmed/37902826 ID - info:doi/10.2196/49324 ER - TY - JOUR AU - Kunitsu, Yuki PY - 2023/10/30 TI - The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists JO - JMIR Med Educ SP - e48452 VL - 9 KW - natural language processing KW - generative pretrained transformer KW - GPT-4 KW - ChatGPT KW - artificial intelligence KW - AI KW - chatbot KW - pharmacy KW - pharmacist N2 - Background: The advancement of artificial intelligence (AI), as well as machine learning, has led to its application in various industries, including health care. AI chatbots, such as GPT-4, developed by OpenAI, have demonstrated potential in supporting health care professionals by providing medical information, answering examination questions, and assisting in medical education. However, the applicability of GPT-4 in the field of pharmacy remains unexplored. Objective: This study aimed to evaluate GPT-4?s ability to answer questions from the Japanese National Examination for Pharmacists (JNEP) and assess its potential as a support tool for pharmacists in their daily practice. Methods: The question texts and answer choices from the 107th and 108th JNEP, held in February 2022 and February 2023, were input into GPT-4. As GPT-4 cannot process diagrams, questions that included diagram interpretation were not analyzed and were initially given a score of 0. The correct answer rates were calculated and compared with the passing criteria of each examination to evaluate GPT-4?s performance. Results: For the 107th and 108th JNEP, GPT-4 achieved an accuracy rate of 64.5% (222/344) and 62.9% (217/345), respectively, for all questions. When considering only the questions that GPT-4 could answer, the accuracy rates increased to 78.2% (222/284) and 75.3% (217/287), respectively. The accuracy rates tended to be lower for physics, chemistry, and calculation questions. Conclusions: Although GPT-4 demonstrated the potential to answer questions from the JNEP and support pharmacists? capabilities, it also showed limitations in handling highly specialized questions, calculation questions, and questions requiring diagram recognition. Further evaluation is necessary to explore its applicability in real-world clinical settings, considering the complexities of patient scenarios and collaboration with health care professionals. By addressing these limitations, GPT-4 could become a more reliable tool for pharmacists in their daily practice. UR - https://mededu.jmir.org/2023/1/e48452 UR - http://dx.doi.org/10.2196/48452 UR - http://www.ncbi.nlm.nih.gov/pubmed/37837968 ID - info:doi/10.2196/48452 ER - TY - JOUR AU - Chin, Hyojin AU - Song, Hyeonho AU - Baek, Gumhee AU - Shin, Mingi AU - Jung, Chani AU - Cha, Meeyoung AU - Choi, Junghoi AU - Cha, Chiyoung PY - 2023/10/20 TI - The Potential of Chatbots for Emotional Support and Promoting Mental Well-Being in Different Cultures: Mixed Methods Study JO - J Med Internet Res SP - e51712 VL - 25 KW - chatbot KW - depressive mood KW - sad KW - depressive discourse KW - sentiment analysis KW - conversational agent KW - mental health KW - health information KW - cultural differences N2 - Background: Artificial intelligence chatbot research has focused on technical advances in natural language processing and validating the effectiveness of human-machine conversations in specific settings. However, real-world chat data remain proprietary and unexplored despite their growing popularity, and new analyses of chatbot uses and their effects on mitigating negative moods are urgently needed. Objective: In this study, we investigated whether and how artificial intelligence chatbots facilitate the expression of user emotions, specifically sadness and depression. We also examined cultural differences in the expression of depressive moods among users in Western and Eastern countries. Methods: This study used SimSimi, a global open-domain social chatbot, to analyze 152,783 conversation utterances containing the terms ?depress? and ?sad? in 3 Western countries (Canada, the United Kingdom, and the United States) and 5 Eastern countries (Indonesia, India, Malaysia, the Philippines, and Thailand). Study 1 reports new findings on the cultural differences in how people talk about depression and sadness to chatbots based on Linguistic Inquiry and Word Count and n-gram analyses. In study 2, we classified chat conversations into predefined topics using semisupervised classification techniques to better understand the types of depressive moods prevalent in chats. We then identified the distinguishing features of chat-based depressive discourse data and the disparity between Eastern and Western users. Results: Our data revealed intriguing cultural differences. Chatbot users in Eastern countries indicated stronger emotions about depression than users in Western countries (positive: P<.001; negative: P=.01); for example, Eastern users used more words associated with sadness (P=.01). However, Western users were more likely to share vulnerable topics such as mental health (P<.001), and this group also had a greater tendency to discuss sensitive topics such as swear words (P<.001) and death (P<.001). In addition, when talking to chatbots, people expressed their depressive moods differently than on other platforms. Users were more open to expressing emotional vulnerability related to depressive or sad moods to chatbots (74,045/148,590, 49.83%) than on social media (149/1978, 7.53%). Chatbot conversations tended not to broach topics that require social support from others, such as seeking advice on daily life difficulties, unlike on social media. However, chatbot users acted in anticipation of conversational agents that exhibit active listening skills and foster a safe space where they can openly share emotional states such as sadness or depression. Conclusions: The findings highlight the potential of chatbot-assisted mental health support, emphasizing the importance of continued technical and policy-wise efforts to improve chatbot interactions for those in need of emotional assistance. Our data indicate the possibility of chatbots providing helpful information about depressive moods, especially for users who have difficulty communicating emotions to other humans. UR - https://www.jmir.org/2023/1/e51712 UR - http://dx.doi.org/10.2196/51712 UR - http://www.ncbi.nlm.nih.gov/pubmed/37862063 ID - info:doi/10.2196/51712 ER - TY - JOUR AU - Preiksaitis, Carl AU - Rose, Christian PY - 2023/10/20 TI - Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review JO - JMIR Med Educ SP - e48785 VL - 9 KW - medical education KW - artificial intelligence KW - ChatGPT KW - Bard KW - AI KW - educator KW - scoping KW - review KW - learner KW - generative N2 - Background: Generative artificial intelligence (AI) technologies are increasingly being utilized across various fields, with considerable interest and concern regarding their potential application in medical education. These technologies, such as Chat GPT and Bard, can generate new content and have a wide range of possible applications. Objective: This study aimed to synthesize the potential opportunities and limitations of generative AI in medical education. It sought to identify prevalent themes within recent literature regarding potential applications and challenges of generative AI in medical education and use these to guide future areas for exploration. Methods: We conducted a scoping review, following the framework by Arksey and O'Malley, of English language articles published from 2022 onward that discussed generative AI in the context of medical education. A literature search was performed using PubMed, Web of Science, and Google Scholar databases. We screened articles for inclusion, extracted data from relevant studies, and completed a quantitative and qualitative synthesis of the data. Results: Thematic analysis revealed diverse potential applications for generative AI in medical education, including self-directed learning, simulation scenarios, and writing assistance. However, the literature also highlighted significant challenges, such as issues with academic integrity, data accuracy, and potential detriments to learning. Based on these themes and the current state of the literature, we propose the following 3 key areas for investigation: developing learners? skills to evaluate AI critically, rethinking assessment methodology, and studying human-AI interactions. Conclusions: The integration of generative AI in medical education presents exciting opportunities, alongside considerable challenges. There is a need to develop new skills and competencies related to AI as well as thoughtful, nuanced approaches to examine the growing use of generative AI in medical education. UR - https://mededu.jmir.org/2023/1/e48785/ UR - http://dx.doi.org/10.2196/48785 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/48785 ER - TY - JOUR AU - Hu, Je-Ming AU - Liu, Feng-Cheng AU - Chu, Chi-Ming AU - Chang, Yu-Tien PY - 2023/10/18 TI - Health Care Trainees? and Professionals? Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study JO - J Med Internet Res SP - e49385 VL - 25 KW - ChatGPT KW - large language model KW - medicine KW - perception evaluation KW - internet survey KW - structural equation modeling KW - SEM N2 - Background: ChatGPT is a powerful pretrained large language model. It has both demonstrated potential and raised concerns related to knowledge translation and knowledge transfer. To apply and improve knowledge transfer in the real world, it is essential to assess the perceptions and acceptance of the users of ChatGPT-assisted training. Objective: We aimed to investigate the perceptions of health care trainees and professionals on ChatGPT-assisted training, using biomedical informatics as an example. Methods: We used purposeful sampling to include all health care undergraduate trainees and graduate professionals (n=195) from January to May 2023 in the School of Public Health at the National Defense Medical Center in Taiwan. Subjects were asked to watch a 2-minute video introducing 5 scenarios about ChatGPT-assisted training in biomedical informatics and then answer a self-designed online (web- and mobile-based) questionnaire according to the Kirkpatrick model. The survey responses were used to develop 4 constructs: ?perceived knowledge acquisition,? ?perceived training motivation,? ?perceived training satisfaction,? and ?perceived training effectiveness.? The study used structural equation modeling (SEM) to evaluate and test the structural model and hypotheses. Results: The online questionnaire response rate was 152 of 195 (78%); 88 of 152 participants (58%) were undergraduate trainees and 90 of 152 participants (59%) were women. The ages ranged from 18 to 53 years (mean 23.3, SD 6.0 years). There was no statistical difference in perceptions of training evaluation between men and women. Most participants were enthusiastic about the ChatGPT-assisted training, while the graduate professionals were more enthusiastic than undergraduate trainees. Nevertheless, some concerns were raised about potential cheating on training assessment. The average scores for knowledge acquisition, training motivation, training satisfaction, and training effectiveness were 3.84 (SD 0.80), 3.76 (SD 0.93), 3.75 (SD 0.87), and 3.72 (SD 0.91), respectively (Likert scale 1-5: strongly disagree to strongly agree). Knowledge acquisition had the highest score and training effectiveness the lowest. In the SEM results, training effectiveness was influenced predominantly by knowledge acquisition and partially met the hypotheses in the research framework. Knowledge acquisition had a direct effect on training effectiveness, training satisfaction, and training motivation, with ? coefficients of .80, .87, and .97, respectively (all P<.001). Conclusions: Most health care trainees and professionals perceived ChatGPT-assisted training as an aid in knowledge transfer. However, to improve training effectiveness, it should be combined with empirical experts for proper guidance and dual interaction. In a future study, we recommend using a larger sample size for evaluation of internet-connected large language models in medical knowledge transfer. UR - https://www.jmir.org/2023/1/e49385 UR - http://dx.doi.org/10.2196/49385 UR - http://www.ncbi.nlm.nih.gov/pubmed/37851495 ID - info:doi/10.2196/49385 ER - TY - JOUR AU - Brown, Andrew AU - Kumar, Tanuj Ash AU - Melamed, Osnat AU - Ahmed, Imtihan AU - Wang, Hao Yu AU - Deza, Arnaud AU - Morcos, Marc AU - Zhu, Leon AU - Maslej, Marta AU - Minian, Nadia AU - Sujaya, Vidya AU - Wolff, Jodi AU - Doggett, Olivia AU - Iantorno, Mathew AU - Ratto, Matt AU - Selby, Peter AU - Rose, Jonathan PY - 2023/10/17 TI - A Motivational Interviewing Chatbot With Generative Reflections for Increasing Readiness to Quit Smoking: Iterative Development Study JO - JMIR Ment Health SP - e49132 VL - 10 KW - conversational agents KW - chatbots KW - behavior change KW - smoking cessation KW - motivational interviewing KW - deep learning KW - natural language processing KW - transformers KW - generative artificial intelligence KW - artificial intelligence KW - AI N2 - Background: The motivational interviewing (MI) approach has been shown to help move ambivalent smokers toward the decision to quit smoking. There have been several attempts to broaden access to MI through text-based chatbots. These typically use scripted responses to client statements, but such nonspecific responses have been shown to reduce effectiveness. Recent advances in natural language processing provide a new way to create responses that are specific to a client?s statements, using a generative language model. Objective: This study aimed to design, evolve, and measure the effectiveness of a chatbot system that can guide ambivalent people who smoke toward the decision to quit smoking with MI-style generative reflections. Methods: Over time, 4 different MI chatbot versions were evolved, and each version was tested with a separate group of ambivalent smokers. A total of 349 smokers were recruited through a web-based recruitment platform. The first chatbot version only asked questions without reflections on the answers. The second version asked the questions and provided reflections with an initial version of the reflection generator. The third version used an improved reflection generator, and the fourth version added extended interaction on some of the questions. Participants? readiness to quit was measured before the conversation and 1 week later using an 11-point scale that measured 3 attributes related to smoking cessation: readiness, confidence, and importance. The number of quit attempts made in the week before the conversation and the week after was surveyed; in addition, participants rated the perceived empathy of the chatbot. The main body of the conversation consists of 5 scripted questions, responses from participants, and (for 3 of the 4 versions) generated reflections. A pretrained transformer-based neural network was fine-tuned on examples of high-quality reflections to generate MI reflections. Results: The increase in average confidence using the nongenerative version was 1.0 (SD 2.0; P=.001), whereas for the 3 generative versions, the increases ranged from 1.2 to 1.3 (SD 2.0-2.3; P<.001). The extended conversation with improved generative reflections was the only version associated with a significant increase in average importance (0.7, SD 2.0; P<.001) and readiness (0.4, SD 1.7; P=.01). The enhanced reflection and extended conversations exhibited significantly better perceived empathy than the nongenerative conversation (P=.02 and P=.004, respectively). The number of quit attempts did not significantly change between the week before the conversation and the week after across all 4 conversations. Conclusions: The results suggest that generative reflections increase the impact of a conversation on readiness to quit smoking 1 week later, although a significant portion of the impact seen so far can be achieved by only asking questions without the reflections. These results support further evolution of the chatbot conversation and can serve as a basis for comparison against more advanced versions. UR - https://mental.jmir.org/2023/1/e49132 UR - http://dx.doi.org/10.2196/49132 UR - http://www.ncbi.nlm.nih.gov/pubmed/37847539 ID - info:doi/10.2196/49132 ER - TY - JOUR AU - Hoffman, Valerie AU - Flom, Megan AU - Mariano, Y. Timothy AU - Chiauzzi, Emil AU - Williams, Andre AU - Kirvin-Quamme, Andrew AU - Pajarito, Sarah AU - Durden, Emily AU - Perski, Olga PY - 2023/10/13 TI - User Engagement Clusters of an 8-Week Digital Mental Health Intervention Guided by a Relational Agent (Woebot): Exploratory Study JO - J Med Internet Res SP - e47198 VL - 25 KW - anxiety KW - clustering KW - depression KW - digital health KW - digital mental health intervention KW - mental health KW - relational agents KW - user engagement N2 - Background: With the proliferation of digital mental health interventions (DMHIs) guided by relational agents, little is known about the behavioral, cognitive, and affective engagement components associated with symptom improvement over time. Obtaining a better understanding could lend clues about recommended use for particular subgroups of the population, the potency of different intervention components, and the mechanisms underlying the intervention?s success. Objective: This exploratory study applied clustering techniques to a range of engagement indicators, which were mapped to the intervention?s active components and the connect, attend, participate, and enact (CAPE) model, to examine the prevalence and characterization of each identified cluster among users of a relational agent-guided DMHI. Methods: We invited adults aged 18 years or older who were interested in using digital support to help with mood management or stress reduction through social media to participate in an 8-week DMHI guided by a natural language processing?supported relational agent, Woebot. Users completed assessments of affective and cognitive engagement, working alliance as measured by goal and task working alliance subscale scores, and enactment (ie, application of therapeutic recommendations in real-world settings). The app passively collected data on behavioral engagement (ie, utilization). We applied agglomerative hierarchical clustering analysis to the engagement indicators to identify the number of clusters that provided the best fit to the data collected, characterized the clusters, and then examined associations with baseline demographic and clinical characteristics as well as mental health outcomes at week 8. Results: Exploratory analyses (n=202) supported 3 clusters: (1) ?typical utilizers? (n=81, 40%), who had intermediate levels of behavioral engagement; (2) ?early utilizers? (n=58, 29%), who had the nominally highest levels of behavioral engagement in week 1; and (3) ?efficient engagers? (n=63, 31%), who had significantly higher levels of affective and cognitive engagement but the lowest level of behavioral engagement. With respect to mental health baseline and outcome measures, efficient engagers had significantly higher levels of baseline resilience (P<.001) and greater declines in depressive symptoms (P=.01) and stress (P=.01) from baseline to week 8 compared to typical utilizers. Significant differences across clusters were found by age, gender identity, race and ethnicity, sexual orientation, education, and insurance coverage. The main analytic findings remained robust in sensitivity analyses. Conclusions: There were 3 distinct engagement clusters found, each with distinct baseline demographic and clinical traits and mental health outcomes. Additional research is needed to inform fine-grained recommendations regarding optimal engagement and to determine the best sequence of particular intervention components with known potency. The findings represent an important first step in disentangling the complex interplay between different affective, cognitive, and behavioral engagement indicators and outcomes associated with use of a DMHI incorporating a natural language processing?supported relational agent. Trial Registration: ClinicalTrials.gov NCT05672745; https://classic.clinicaltrials.gov/ct2/show/NCT05672745 UR - https://www.jmir.org/2023/1/e47198 UR - http://dx.doi.org/10.2196/47198 UR - http://www.ncbi.nlm.nih.gov/pubmed/37831490 ID - info:doi/10.2196/47198 ER - TY - JOUR AU - Kang, Annie AU - Hetrick, Sarah AU - Cargo, Tania AU - Hopkins, Sarah AU - Ludin, Nicola AU - Bodmer, Sarah AU - Stevenson, Kiani AU - Holt-Quick, Chester AU - Stasiak, Karolina PY - 2023/10/12 TI - Exploring Young Adults? Views About Aroha, a Chatbot for Stress Associated With the COVID-19 Pandemic: Interview Study Among Students JO - JMIR Form Res SP - e44556 VL - 7 KW - chatbot KW - mental health KW - COVID-19 KW - young adults KW - acceptability KW - qualitative methods N2 - Background: In March 2020, New Zealand was plunged into its first nationwide lockdown to halt the spread of COVID-19. Our team rapidly adapted our existing chatbot platform to create Aroha, a well-being chatbot intended to address the stress experienced by young people aged 13 to 24 years in the early phase of the pandemic. Aroha was made available nationally within 2 weeks of the lockdown and continued to be available throughout 2020. Objective: In this study, we aimed to evaluate the acceptability and relevance of the chatbot format and Aroha?s content in young adults and to identify areas for improvement. Methods: We conducted qualitative in-depth and semistructured interviews with young adults as well as in situ demonstrations of Aroha to elicit immediate feedback. Interviews were recorded, transcribed, and analyzed using thematic analysis assisted by NVivo (version 12; QSR International). Results: A total of 15 young adults (age in years: median 20; mean 20.07, SD 3.17; female students: n=13, 87%; male students: n=2, 13%; all tertiary students) were interviewed in person. Participants spoke of the challenges of living during the lockdown, including social isolation, loss of motivation, and the demands of remote work or study, although some were able to find silver linings. Aroha was well liked for sounding like a ?real person? and peer with its friendly local ?Kiwi? communication style, rather than an authoritative adult or counselor. The chatbot was praised for including content that went beyond traditional mental health advice. Participants particularly enjoyed the modules on gratitude, being active, anger management, job seeking, and how to deal with alcohol and drugs. Aroha was described as being more accessible than traditional mental health counseling and resources. It was an appealing option for those who did not want to talk to someone in person for fear of the stigma associated with mental health. However, participants disliked the software bugs. They also wanted a more sophisticated conversational interface where they could express themselves and ?vent? in free text. There were several suggestions for making Aroha more relevant to a diverse range of users, including developing content on navigating relationships and diverse chatbot avatars. Conclusions: Chatbots are an acceptable format for scaling up the delivery of public mental health and well-being?enhancing strategies. We make the following recommendations for others interested in designing and rolling out mental health chatbots to better support young people: make the chatbot relatable to its target audience by working with them to develop an authentic and relevant communication style; consider including holistic health and lifestyle content beyond traditional ?mental health? support; and focus on developing features that make users feel heard, understood, and empowered. UR - https://formative.jmir.org/2023/1/e44556 UR - http://dx.doi.org/10.2196/44556 UR - http://www.ncbi.nlm.nih.gov/pubmed/37527545 ID - info:doi/10.2196/44556 ER - TY - JOUR AU - Rambaud, Kimberly AU - van Woerden, Simon AU - Palumbo, Leonardo AU - Salvi, Cristiana AU - Smallwood, Catherine AU - Rockenschaub, Gerald AU - Okoliyski, Michail AU - Marinova, Lora AU - Fomaidi, Galina AU - Djalalova, Malika AU - Faruqui, Nabiha AU - Melo Bianco, Viviane AU - Mosquera, Mario AU - Spasov, Ivaylo AU - Totskaya, Yekaterina PY - 2023/10/10 TI - Building a Chatbot in a Pandemic JO - J Med Internet Res SP - e42960 VL - 25 KW - COVID-19 KW - chatbots KW - evidence-based communication channels KW - conversational agent KW - user-centered KW - health promotion KW - digital health intervention KW - online health information KW - digital health tool KW - health communication UR - https://www.jmir.org/2023/1/e42960 UR - http://dx.doi.org/10.2196/42960 UR - http://www.ncbi.nlm.nih.gov/pubmed/37074958 ID - info:doi/10.2196/42960 ER - TY - JOUR AU - Hirosawa, Takanobu AU - Kawamura, Ren AU - Harada, Yukinori AU - Mizuta, Kazuya AU - Tokumasu, Kazuki AU - Kaji, Yuki AU - Suzuki, Tomoharu AU - Shimizu, Taro PY - 2023/10/9 TI - ChatGPT-Generated Differential Diagnosis Lists for Complex Case?Derived Clinical Vignettes: Diagnostic Accuracy Evaluation JO - JMIR Med Inform SP - e48808 VL - 11 KW - artificial intelligence KW - AI chatbot KW - ChatGPT KW - large language models KW - clinical decision support KW - natural language processing KW - diagnostic excellence KW - language model KW - vignette KW - case study KW - diagnostic KW - accuracy KW - decision support KW - diagnosis N2 - Background: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. Objective: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. Methods: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. Results: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models? diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). Conclusions: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making. UR - https://medinform.jmir.org/2023/1/e48808 UR - http://dx.doi.org/10.2196/48808 UR - http://www.ncbi.nlm.nih.gov/pubmed/37812468 ID - info:doi/10.2196/48808 ER - TY - JOUR AU - Andrews, Emma Nicole AU - Ireland, David AU - Vijayakumar, Pranavie AU - Burvill, Lyza AU - Hay, Elizabeth AU - Westerman, Daria AU - Rose, Tanya AU - Schlumpf, Mikaela AU - Strong, Jenny AU - Claus, Andrew PY - 2023/10/6 TI - Acceptability of a Pain History Assessment and Education Chatbot (Dolores) Across Age Groups in Populations With Chronic Pain: Development and Pilot Testing JO - JMIR Form Res SP - e47267 VL - 7 KW - chronic pain KW - education KW - neurophysiology KW - neuroscience KW - conversation agent KW - chatbot KW - age KW - young adult KW - adolescence KW - adolescent KW - pain KW - patient education KW - usability KW - acceptability KW - mobile health KW - mHealth KW - mobile app KW - health app KW - youth KW - mobile phone N2 - Background: The delivery of education on pain neuroscience and the evidence for different treatment approaches has become a key component of contemporary persistent pain management. Chatbots, or more formally conversation agents, are increasingly being used in health care settings due to their versatility in providing interactive and individualized approaches to both capture and deliver information. Research focused on the acceptability of diverse chatbot formats can assist in developing a better understanding of the educational needs of target populations. Objective: This study aims to detail the development and initial pilot testing of a multimodality pain education chatbot (Dolores) that can be used across different age groups and investigate whether acceptability and feedback were comparable across age groups following pilot testing. Methods: Following an initial design phase involving software engineers (n=2) and expert clinicians (n=6), a total of 60 individuals with chronic pain who attended an outpatient clinic at 1 of 2 pain centers in Australia were recruited for pilot testing. The 60 individuals consisted of 20 (33%) adolescents (aged 10-18 years), 20 (33%) young adults (aged 19-35 years), and 20 (33%) adults (aged >35 years) with persistent pain. Participants spent 20 to 30 minutes completing interactive chatbot activities that enabled the Dolores app to gather a pain history and provide education about pain and pain treatments. After the chatbot activities, participants completed a custom-made feedback questionnaire measuring the acceptability constructs pertaining to health education chatbots. To determine the effect of age group on the acceptability ratings and feedback provided, a series of binomial logistic regression models and cumulative odds ordinal logistic regression models with proportional odds were generated. Results: Overall, acceptability was high for the following constructs: engagement, perceived value, usability, accuracy, responsiveness, adoption intention, esthetics, and overall quality. The effect of age group on all acceptability ratings was small and not statistically significant. An analysis of open-ended question responses revealed that major frustrations with the app were related to Dolores? speech, which was explored further through a comparative analysis. With respect to providing negative feedback about Dolores? speech, a logistic regression model showed that the effect of age group was statistically significant (?22=11.7; P=.003) and explained 27.1% of the variance (Nagelkerke R2). Adults and young adults were less likely to comment on Dolores? speech compared with adolescent participants (odds ratio 0.20, 95% CI 0.05-0.84 and odds ratio 0.05, 95% CI 0.01-0.43, respectively). Comments were related to both speech rate (too slow) and quality (unpleasant and robotic). Conclusions: This study provides support for the acceptability of pain history and education chatbots across different age groups. Chatbot acceptability for adolescent cohorts may be improved by enabling the self-selection of speech characteristics such as rate and personable tone. UR - https://formative.jmir.org/2023/1/e47267 UR - http://dx.doi.org/10.2196/47267 UR - http://www.ncbi.nlm.nih.gov/pubmed/37801342 ID - info:doi/10.2196/47267 ER - TY - JOUR AU - Meskó, Bertalan PY - 2023/10/4 TI - Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial JO - J Med Internet Res SP - e50638 VL - 25 KW - artificial intelligence KW - AI KW - digital health KW - future KW - technology KW - ChatGPT KW - GPT-4 KW - large language models KW - language model KW - LLM KW - prompt KW - prompts KW - prompt engineering KW - AI tool KW - engineering KW - healthcare professional KW - decision-making KW - LLMs KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - NLP KW - natural language processing UR - https://www.jmir.org/2023/1/e50638 UR - http://dx.doi.org/10.2196/50638 UR - http://www.ncbi.nlm.nih.gov/pubmed/37792434 ID - info:doi/10.2196/50638 ER - TY - JOUR AU - Passanante, Aly AU - Pertwee, Ed AU - Lin, Leesa AU - Lee, Yoonsup Kristi AU - Wu, T. Joseph AU - Larson, J. Heidi PY - 2023/10/3 TI - Conversational AI and Vaccine Communication: Systematic Review of the Evidence JO - J Med Internet Res SP - e42758 VL - 25 KW - chatbots KW - artificial intelligence KW - conversational AI KW - vaccine communication KW - vaccine hesitancy KW - conversational agent KW - COVID-19 KW - vaccine information KW - health information N2 - Background: Since the mid-2010s, use of conversational artificial intelligence (AI; chatbots) in health care has expanded significantly, especially in the context of increased burdens on health systems and restrictions on in-person consultations with health care providers during the COVID-19 pandemic. One emerging use for conversational AI is to capture evolving questions and communicate information about vaccines and vaccination. Objective: The objective of this systematic review was to examine documented uses and evidence on the effectiveness of conversational AI for vaccine communication. Methods: This systematic review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. PubMed, Web of Science, PsycINFO, MEDLINE, Scopus, CINAHL Complete, Cochrane Library, Embase, Epistemonikos, Global Health, Global Index Medicus, Academic Search Complete, and the University of London library database were searched for papers on the use of conversational AI for vaccine communication. The inclusion criteria were studies that included (1) documented instances of conversational AI being used for the purpose of vaccine communication and (2) evaluation data on the impact and effectiveness of the intervention. Results: After duplicates were removed, the review identified 496 unique records, which were then screened by title and abstract, of which 38 were identified for full-text review. Seven fit the inclusion criteria and were assessed and summarized in the findings of this review. Overall, vaccine chatbots deployed to date have been relatively simple in their design and have mainly been used to provide factual information to users in response to their questions about vaccines. Additionally, chatbots have been used for vaccination scheduling, appointment reminders, debunking misinformation, and, in some cases, for vaccine counseling and persuasion. Available evidence suggests that chatbots can have a positive effect on vaccine attitudes; however, studies were typically exploratory in nature, and some lacked a control group or had very small sample sizes. Conclusions: The review found evidence of potential benefits from conversational AI for vaccine communication. Factors that may contribute to the effectiveness of vaccine chatbots include their ability to provide credible and personalized information in real time, the familiarity and accessibility of the chatbot platform, and the extent to which interactions with the chatbot feel ?natural? to users. However, evaluations have focused on the short-term, direct effects of chatbots on their users. The potential longer-term and societal impacts of conversational AI have yet to be analyzed. In addition, existing studies do not adequately address how ethics apply in the field of conversational AI around vaccines. In a context where further digitalization of vaccine communication can be anticipated, additional high-quality research will be required across all these areas. UR - https://www.jmir.org/2023/1/e42758 UR - http://dx.doi.org/10.2196/42758 UR - http://www.ncbi.nlm.nih.gov/pubmed/37788057 ID - info:doi/10.2196/42758 ER - TY - JOUR AU - Flores-Cohaila, A. Javier AU - García-Vicente, Abigaíl AU - Vizcarra-Jiménez, F. Sonia AU - De la Cruz-Galán, P. Janith AU - Gutiérrez-Arratia, D. Jesús AU - Quiroga Torres, Geraldine Blanca AU - Taype-Rondan, Alvaro PY - 2023/9/28 TI - Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study JO - JMIR Med Educ SP - e48039 VL - 9 KW - medical education KW - generative pre-trained transformer KW - ChatGPT KW - licensing examination KW - assessment KW - Peru KW - Examen Nacional de Medicina KW - ENAM KW - learning model KW - artificial intelligence KW - AI KW - medical examination N2 - Background: ChatGPT has shown impressive performance in national medical licensing examinations, such as the United States Medical Licensing Examination (USMLE), even passing it with expert-level performance. However, there is a lack of research on its performance in low-income countries? national licensing medical examinations. In Peru, where almost one out of three examinees fails the national licensing medical examination, ChatGPT has the potential to enhance medical education. Objective: We aimed to assess the accuracy of ChatGPT using GPT-3.5 and GPT-4 on the Peruvian National Licensing Medical Examination (Examen Nacional de Medicina [ENAM]). Additionally, we sought to identify factors associated with incorrect answers provided by ChatGPT. Methods: We used the ENAM 2022 data set, which consisted of 180 multiple-choice questions, to evaluate the performance of ChatGPT. Various prompts were used, and accuracy was evaluated. The performance of ChatGPT was compared to that of a sample of 1025 examinees. Factors such as question type, Peruvian-specific knowledge, discrimination, difficulty, quality of questions, and subject were analyzed to determine their influence on incorrect answers. Questions that received incorrect answers underwent a three-step process involving different prompts to explore the potential impact of adding roles and context on ChatGPT?s accuracy. Results: GPT-4 achieved an accuracy of 86% on the ENAM, followed by GPT-3.5 with 77%. The accuracy obtained by the 1025 examinees was 55%. There was a fair agreement (?=0.38) between GPT-3.5 and GPT-4. Moderate-to-high-difficulty questions were associated with incorrect answers in the crude and adjusted model for GPT-3.5 (odds ratio [OR] 6.6, 95% CI 2.73-15.95) and GPT-4 (OR 33.23, 95% CI 4.3-257.12). After reinputting questions that received incorrect answers, GPT-3.5 went from 41 (100%) to 12 (29%) incorrect answers, and GPT-4 from 25 (100%) to 4 (16%). Conclusions: Our study found that ChatGPT (GPT-3.5 and GPT-4) can achieve expert-level performance on the ENAM, outperforming most of our examinees. We found fair agreement between both GPT-3.5 and GPT-4. Incorrect answers were associated with the difficulty of questions, which may resemble human performance. Furthermore, by reinputting questions that initially received incorrect answers with different prompts containing additional roles and context, ChatGPT achieved improved accuracy. UR - https://mededu.jmir.org/2023/1/e48039 UR - http://dx.doi.org/10.2196/48039 UR - http://www.ncbi.nlm.nih.gov/pubmed/37768724 ID - info:doi/10.2196/48039 ER - TY - JOUR AU - Wutz, Maximilian AU - Hermes, Marius AU - Winter, Vera AU - Köberlein-Neu, Juliane PY - 2023/9/26 TI - Factors Influencing the Acceptability, Acceptance, and Adoption of Conversational Agents in Health Care: Integrative Review JO - J Med Internet Res SP - e46548 VL - 25 KW - conversational agent KW - chatbot KW - acceptability KW - acceptance KW - adoption KW - health care KW - digital health KW - artificial intelligence KW - AI KW - natural language KW - mobile phone N2 - Background: Conversational agents (CAs), also known as chatbots, are digital dialog systems that enable people to have a text-based, speech-based, or nonverbal conversation with a computer or another machine based on natural language via an interface. The use of CAs offers new opportunities and various benefits for health care. However, they are not yet ubiquitous in daily practice. Nevertheless, research regarding the implementation of CAs in health care has grown tremendously in recent years. Objective: This review aims to present a synthesis of the factors that facilitate or hinder the implementation of CAs from the perspectives of patients and health care professionals. Specifically, it focuses on the early implementation outcomes of acceptability, acceptance, and adoption as cornerstones of later implementation success. Methods: We performed an integrative review. To identify relevant literature, a broad literature search was conducted in June 2021 with no date limits and using all fields in PubMed, Cochrane Library, Web of Science, LIVIVO, and PsycINFO. To keep the review current, another search was conducted in March 2022. To identify as many eligible primary sources as possible, we used a snowballing approach by searching reference lists and conducted a hand search. Factors influencing the acceptability, acceptance, and adoption of CAs in health care were coded through parallel deductive and inductive approaches, which were informed by current technology acceptance and adoption models. Finally, the factors were synthesized in a thematic map. Results: Overall, 76 studies were included in this review. We identified influencing factors related to 4 core Unified Theory of Acceptance and Use of Technology (UTAUT) and Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) factors (performance expectancy, effort expectancy, facilitating conditions, and hedonic motivation), with most studies underlining the relevance of performance and effort expectancy. To meet the particularities of the health care context, we redefined the UTAUT2 factors social influence, habit, and price value. We identified 6 other influencing factors: perceived risk, trust, anthropomorphism, health issue, working alliance, and user characteristics. Overall, we identified 10 factors influencing acceptability, acceptance, and adoption among health care professionals (performance expectancy, effort expectancy, facilitating conditions, social influence, price value, perceived risk, trust, anthropomorphism, working alliance, and user characteristics) and 13 factors influencing acceptability, acceptance, and adoption among patients (additionally hedonic motivation, habit, and health issue). Conclusions: This review shows manifold factors influencing the acceptability, acceptance, and adoption of CAs in health care. Knowledge of these factors is fundamental for implementation planning. Therefore, the findings of this review can serve as a basis for future studies to develop appropriate implementation strategies. Furthermore, this review provides an empirical test of current technology acceptance and adoption models and identifies areas where additional research is necessary. Trial Registration: PROSPERO CRD42022343690; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=343690 UR - https://www.jmir.org/2023/1/e46548 UR - http://dx.doi.org/10.2196/46548 UR - http://www.ncbi.nlm.nih.gov/pubmed/37751279 ID - info:doi/10.2196/46548 ER - TY - JOUR AU - Miao, Hongyu AU - Li, Chengdong AU - Wang, Jing PY - 2023/9/26 TI - A Future of Smarter Digital Health Empowered by Generative Pretrained Transformer JO - J Med Internet Res SP - e49963 VL - 25 KW - generative pretrained model KW - artificial intelligence KW - digital health KW - generative pretrained transformer KW - ChatGPT KW - precision medicine KW - AI KW - privacy KW - ethics UR - https://www.jmir.org/2023/1/e49963 UR - http://dx.doi.org/10.2196/49963 UR - http://www.ncbi.nlm.nih.gov/pubmed/37751243 ID - info:doi/10.2196/49963 ER - TY - JOUR AU - Levkovich, Inbar AU - Elyoseph, Zohar PY - 2023/9/20 TI - Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study JO - JMIR Ment Health SP - e51232 VL - 10 KW - artificial intelligence KW - ChatGPT KW - diagnosis KW - psychological assessment KW - psychological KW - suicide risk KW - risk assessment KW - text vignette KW - NLP KW - natural language processing KW - suicide KW - suicidal KW - risk KW - assessment KW - vignette KW - vignettes KW - assessments KW - mental KW - self-harm N2 - Background: ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT?s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. Objective: The study?s aim was to evaluate ChatGPT?s ability to assess suicide risk, taking into consideration 2 discernable factors?perceived burdensomeness and thwarted belongingness?over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. Methods: ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4?s proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). Results: During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of ?0.83). The empirical evidence suggests that ChatGPT-4?s evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of ?0.89 and ?0.90, respectively). Conclusions: The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4?s potential to support gatekeepers, patients, and even mental health professionals? decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4?s capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one?s actual suicide risk level. UR - https://mental.jmir.org/2023/1/e51232 UR - http://dx.doi.org/10.2196/51232 UR - http://www.ncbi.nlm.nih.gov/pubmed/37728984 ID - info:doi/10.2196/51232 ER - TY - JOUR AU - Pendergrast, Tricia AU - Chalmers, Zachary PY - 2023/9/20 TI - Anki Tagger: A Generative AI Tool for Aligning Third-Party Resources to Preclinical Curriculum JO - JMIR Med Educ SP - e48780 VL - 9 KW - ChatGPT KW - undergraduate medical education KW - large language models KW - Anki KW - flashcards KW - artificial intelligence KW - AI UR - https://mededu.jmir.org/2023/1/e48780 UR - http://dx.doi.org/10.2196/48780 UR - http://www.ncbi.nlm.nih.gov/pubmed/37728965 ID - info:doi/10.2196/48780 ER - TY - JOUR AU - Huang, ST Ryan AU - Lu, Qi Kevin Jia AU - Meaney, Christopher AU - Kemppainen, Joel AU - Punnett, Angela AU - Leung, Fok-Han PY - 2023/9/19 TI - Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study JO - JMIR Med Educ SP - e50514 VL - 9 KW - medical education KW - medical knowledge exam KW - artificial intelligence KW - AI KW - natural language processing KW - NLP KW - large language model KW - LLM KW - machine learning, ChatGPT KW - GPT-3.5 KW - GPT-4 KW - education KW - language model KW - education examination KW - testing KW - utility KW - family medicine KW - medical residents KW - test KW - community N2 - Background: Large language model (LLM)?based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks and language generation have advanced to the point of performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing the performance of these 2 LLM models to that of Family Medicine residents on a multiple-choice medical knowledge test can provide insights into their potential as medical education tools. Objective: This study aimed to quantitatively and qualitatively compare the performance of GPT-3.5, GPT-4, and Family Medicine residents in a multiple-choice medical knowledge test appropriate for the level of a Family Medicine resident. Methods: An official University of Toronto Department of Family and Community Medicine Progress Test consisting of multiple-choice questions was inputted into GPT-3.5 and GPT-4. The artificial intelligence chatbot?s responses were manually reviewed to determine the selected answer, response length, response time, provision of a rationale for the outputted response, and the root cause of all incorrect responses (classified into arithmetic, logical, and information errors). The performance of the artificial intelligence chatbots were compared against a cohort of Family Medicine residents who concurrently attempted the test. Results: GPT-4 performed significantly better compared to GPT-3.5 (difference 25.0%, 95% CI 16.3%-32.8%; McNemar test: P<.001); it correctly answered 89/108 (82.4%) questions, while GPT-3.5 answered 62/108 (57.4%) questions correctly. Further, GPT-4 scored higher across all 11 categories of Family Medicine knowledge. In 86.1% (n=93) of the responses, GPT-4 provided a rationale for why other multiple-choice options were not chosen compared to the 16.7% (n=18) achieved by GPT-3.5. Qualitatively, for both GPT-3.5 and GPT-4 responses, logical errors were the most common, while arithmetic errors were the least common. The average performance of Family Medicine residents was 56.9% (95% CI 56.2%-57.6%). The performance of GPT-3.5 was similar to that of the average Family Medicine resident (P=.16), while the performance of GPT-4 exceeded that of the top-performing Family Medicine resident (P<.001). Conclusions: GPT-4 significantly outperforms both GPT-3.5 and Family Medicine residents on a multiple-choice medical knowledge test designed for Family Medicine residents. GPT-4 provides a logical rationale for its response choice, ruling out other answer choices efficiently and with concise justification. Its high degree of accuracy and advanced reasoning capabilities facilitate its potential applications in medical education, including the creation of exam questions and scenarios as well as serving as a resource for medical knowledge or information on community services. UR - https://mededu.jmir.org/2023/1/e50514 UR - http://dx.doi.org/10.2196/50514 UR - http://www.ncbi.nlm.nih.gov/pubmed/37725411 ID - info:doi/10.2196/50514 ER - TY - JOUR AU - Chen, Nai-Jung AU - Huang, Chiu-Mieh AU - Fan, Ching-Chih AU - Lu, Li-Ting AU - Lin, Fen-He AU - Liao, Jung-Yu AU - Guo, Jong-Long PY - 2023/9/19 TI - User Evaluation of a Chat-Based Instant Messaging Support Health Education Program for Patients With Chronic Kidney Disease: Preliminary Findings of a Formative Study JO - JMIR Form Res SP - e45484 VL - 7 KW - chronic kidney disease KW - chatbot KW - health education KW - push notification KW - users? evaluation N2 - Background: Artificial intelligence?driven chatbots are increasingly being used in health care, but few chat-based instant messaging support health education programs are designed for patients with chronic kidney disease (CKD) to evaluate their effectiveness. In addition, limited research exists on the usage of chat-based programs among patients with CKD, particularly those that integrate a chatbot aimed at enhancing the communication ability and disease-specific knowledge of patients. Objective: The objective of this formative study is to gather the data necessary to develop an intervention program of chat-based instant messaging support health education for patients with CKD. Participants? user experiences will form the basis for program design improvements. Methods: Data were collected from April to November 2020 using a structured questionnaire. A pre-post design was used, and a total of 60 patients consented to join the 3-month program. Among them, 55 successfully completed the study measurements. The System Usability Scale was used for participant evaluations of the usability of the chat-based program. Results: Paired t tests revealed significant differences before and after intervention for communicative literacy (t54=3.99; P<.001) and CKD-specific disease knowledge (t54=7.54; P<.001). Within disease knowledge, significant differences were observed in the aspects of CKD basic knowledge (t54=3.46; P=.001), lifestyle (t54=3.83; P=.001), dietary intake (t54=5.51; P<.001), and medication (t54=4.17; P=.001). However, no significant difference was found in the aspect of disease prevention. Subgroup analysis revealed that while the findings among male participants were similar to those of the main sample, this was not the case among female participants. Conclusions: The findings reveal that a chat-based instant messaging support health education program may be effective for middle-aged and older patients with CKD. The use of a chat-based program with multiple promoting approaches is promising, and users? evaluation is satisfactory. Trial Registration: ClinicalTrials.gov NCT05665517; https://clinicaltrials.gov/study/NCT05665517 UR - https://formative.jmir.org/2023/1/e45484 UR - http://dx.doi.org/10.2196/45484 UR - http://www.ncbi.nlm.nih.gov/pubmed/37725429 ID - info:doi/10.2196/45484 ER - TY - JOUR AU - Kuroiwa, Tomoyuki AU - Sarcon, Aida AU - Ibara, Takuya AU - Yamada, Eriku AU - Yamamoto, Akiko AU - Tsukamoto, Kazuya AU - Fujita, Koji PY - 2023/9/15 TI - The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study JO - J Med Internet Res SP - e47621 VL - 25 KW - ChatGPT KW - generative pretrained transformer KW - natural language processing KW - artificial intelligence KW - chatbot KW - diagnosis KW - self-diagnosis KW - accuracy KW - precision KW - language model KW - orthopedic disease KW - AI model KW - health information N2 - Background: Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT?s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective: The aim of this study was to evaluate ChatGPT?s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods: Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss ? coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results: The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, ?0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases ?essential,? ?recommended,? ?best,? and ?important? were used. Specifically, ?essential? occurred in 4 out of 125, ?recommended? in 12 out of 125, ?best? in 6 out of 125, and ?important? in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions: The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study. UR - https://www.jmir.org/2023/1/e47621 UR - http://dx.doi.org/10.2196/47621 UR - http://www.ncbi.nlm.nih.gov/pubmed/37713254 ID - info:doi/10.2196/47621 ER - TY - JOUR AU - Khlaif, N. Zuheir AU - Mousa, Allam AU - Hattab, Kamal Muayad AU - Itmazi, Jamil AU - Hassan, A. Amjad AU - Sanmugam, Mageswaran AU - Ayyoub, Abedalkarim PY - 2023/9/14 TI - The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation JO - JMIR Med Educ SP - e47049 VL - 9 KW - artificial intelligence KW - AI KW - ChatGPT KW - scientific research KW - research ethics N2 - Background: Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language. Objective: This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application?s impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text. Methods: A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers. Results: When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing. Conclusions: AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members. UR - https://mededu.jmir.org/2023/1/e47049 UR - http://dx.doi.org/10.2196/47049 UR - http://www.ncbi.nlm.nih.gov/pubmed/37707884 ID - info:doi/10.2196/47049 ER - TY - JOUR AU - Fear, Kathleen AU - Gleber, Conrad PY - 2023/9/13 TI - Shaping the Future of Older Adult Care: ChatGPT, Advanced AI, and the Transformation of Clinical Practice JO - JMIR Aging SP - e51776 VL - 6 KW - generative AI KW - artificial intelligence KW - large language models KW - ChatGPT KW - Generative Pre-trained Transformer UR - https://aging.jmir.org/2023/1/e51776 UR - http://dx.doi.org/10.2196/51776 UR - http://www.ncbi.nlm.nih.gov/pubmed/37703085 ID - info:doi/10.2196/51776 ER - TY - JOUR AU - Sezgin, Emre AU - Chekeni, Faraaz AU - Lee, Jennifer AU - Keim, Sarah PY - 2023/9/11 TI - Clinical Accuracy of Large Language Models and Google Search Responses to Postpartum Depression Questions: Cross-Sectional Study JO - J Med Internet Res SP - e49240 VL - 25 KW - mental health KW - postpartum depression KW - health information seeking KW - large language model KW - GPT KW - LaMDA KW - Google KW - ChatGPT KW - artificial intelligence KW - natural language processing KW - generative AI KW - depression KW - cross-sectional study KW - clinical accuracy UR - https://www.jmir.org/2023/1/e49240 UR - http://dx.doi.org/10.2196/49240 UR - http://www.ncbi.nlm.nih.gov/pubmed/37695668 ID - info:doi/10.2196/49240 ER - TY - JOUR AU - Sallam, Malik AU - Salim, A. Nesreen AU - Barakat, Muna AU - Al-Mahzoum, Kholoud AU - Al-Tammemi, B. Ala'a AU - Malaeb, Diana AU - Hallit, Rabih AU - Hallit, Souheil PY - 2023/9/5 TI - Assessing Health Students' Attitudes and Usage of ChatGPT in Jordan: Validation Study JO - JMIR Med Educ SP - e48254 VL - 9 KW - artificial intelligence KW - machine learning KW - education KW - technology KW - healthcare KW - survey KW - opinion KW - knowledge KW - practices KW - KAP N2 - Background: ChatGPT is a conversational large language model that has the potential to revolutionize knowledge acquisition. However, the impact of this technology on the quality of education is still unknown considering the risks and concerns surrounding ChatGPT use. Therefore, it is necessary to assess the usability and acceptability of this promising tool. As an innovative technology, the intention to use ChatGPT can be studied in the context of the technology acceptance model (TAM). Objective: This study aimed to develop and validate a TAM-based survey instrument called TAME-ChatGPT (Technology Acceptance Model Edited to Assess ChatGPT Adoption) that could be employed to examine the successful integration and use of ChatGPT in health care education. Methods: The survey tool was created based on the TAM framework. It comprised 13 items for participants who heard of ChatGPT but did not use it and 23 items for participants who used ChatGPT. Using a convenient sampling approach, the survey link was circulated electronically among university students between February and March 2023. Exploratory factor analysis (EFA) was used to assess the construct validity of the survey instrument. Results: The final sample comprised 458 respondents, the majority among them undergraduate students (n=442, 96.5%). Only 109 (23.8%) respondents had heard of ChatGPT prior to participation and only 55 (11.3%) self-reported ChatGPT use before the study. EFA analysis on the attitude and usage scales showed significant Bartlett tests of sphericity scores (P<.001) and adequate Kaiser-Meyer-Olkin measures (0.823 for the attitude scale and 0.702 for the usage scale), confirming the factorability of the correlation matrices. The EFA showed that 3 constructs explained a cumulative total of 69.3% variance in the attitude scale, and these subscales represented perceived risks, attitude to technology/social influence, and anxiety. For the ChatGPT usage scale, EFA showed that 4 constructs explained a cumulative total of 72% variance in the data and comprised the perceived usefulness, perceived risks, perceived ease of use, and behavior/cognitive factors. All the ChatGPT attitude and usage subscales showed good reliability with Cronbach ? values >.78 for all the deduced subscales. Conclusions: The TAME-ChatGPT demonstrated good reliability, validity, and usefulness in assessing health care students? attitudes toward ChatGPT. The findings highlighted the importance of considering risk perceptions, usefulness, ease of use, attitudes toward technology, and behavioral factors when adopting ChatGPT as a tool in health care education. This information can aid the stakeholders in creating strategies to support the optimal and ethical use of ChatGPT and to identify the potential challenges hindering its successful implementation. Future research is recommended to guide the effective adoption of ChatGPT in health care education. UR - https://mededu.jmir.org/2023/1/e48254 UR - http://dx.doi.org/10.2196/48254 UR - http://www.ncbi.nlm.nih.gov/pubmed/37578934 ID - info:doi/10.2196/48254 ER - TY - JOUR AU - Roos, Jonas AU - Kasapovic, Adnan AU - Jansen, Tom AU - Kaczmarczyk, Robert PY - 2023/9/4 TI - Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany JO - JMIR Med Educ SP - e46482 VL - 9 KW - medical education KW - state examinations KW - exams KW - large language models KW - artificial intelligence KW - ChatGPT N2 - Background: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation.  Objective: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students.  Methods: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated.  Results: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty.  Conclusions: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape.  UR - https://mededu.jmir.org/2023/1/e46482 UR - http://dx.doi.org/10.2196/46482 UR - http://www.ncbi.nlm.nih.gov/pubmed/37665620 ID - info:doi/10.2196/46482 ER - TY - JOUR AU - Siglen, Elen AU - Vetti, Høberg Hildegunn AU - Augestad, Mirjam AU - Steen, M. Vidar AU - Lunde, Åshild AU - Bjorvatn, Cathrine PY - 2023/9/1 TI - Evaluation of the Rosa Chatbot Providing Genetic Information to Patients at Risk of Hereditary Breast and Ovarian Cancer: Qualitative Interview Study JO - J Med Internet Res SP - e46571 VL - 25 KW - chatbot KW - chatbots KW - genetic KW - trust KW - acceptability KW - perception KW - perceived KW - genetic counseling KW - hybrid health care KW - digital health tool KW - digital information tool KW - digital health technology KW - virtual assistant KW - hereditary breast and ovarian cancer KW - hereditary KW - genetic testing KW - technology KW - genetic clinic KW - digital tool KW - ovarian cancer KW - breast cancer KW - information retrieval KW - women?s health KW - breast KW - ovarian KW - cancer KW - oncology KW - mobile phone N2 - Background: Genetic testing has become an integrated part of health care for patients with breast or ovarian cancer, and the increasing demand for genetic testing is accompanied by an increasing need for easy access to reliable genetic information for patients. Therefore, we developed a chatbot app (Rosa) that is able to perform humanlike digital conversations about genetic BRCA testing. Objective: Before implementing this new information service in daily clinical practice, we wanted to explore 2 aspects of chatbot use: the perceived utility and trust in chatbot technology among healthy patients at risk of hereditary cancer and how interaction with a chatbot regarding sensitive information about hereditary cancer influences patients. Methods: Overall, 175 healthy individuals at risk of hereditary breast and ovarian cancer were invited to test the chatbot, Rosa, before and after genetic counseling. To secure a varied sample, participants were recruited from all cancer genetic clinics in Norway, and the selection was based on age, gender, and risk of having a BRCA pathogenic variant. Among the 34.9% (61/175) of participants who consented for individual interview, a selected subgroup (16/61, 26%) shared their experience through in-depth interviews via video. The semistructured interviews covered the following topics: usability, perceived usefulness, trust in the information received via the chatbot, how Rosa influenced the user, and thoughts about future use of digital tools in health care. The transcripts were analyzed using the stepwise-deductive inductive approach. Results: The overall finding was that the chatbot was very welcomed by the participants. They appreciated the 24/7 availability wherever they were and the possibility to use it to prepare for genetic counseling and to repeat and ask questions about what had been said afterward. As Rosa was created by health care professionals, they also valued the information they received as being medically correct. Rosa was referred to as being better than Google because it provided specific and reliable answers to their questions. The findings were summed up in 3 concepts: ?Anytime, anywhere?; ?In addition, not instead?; and ?Trustworthy and true.? All participants (16/16) denied increased worry after reading about genetic testing and hereditary breast and ovarian cancer in Rosa. Conclusions: Our results indicate that a genetic information chatbot has the potential to contribute to easy access to uniform information for patients at risk of hereditary breast and ovarian cancer, regardless of geographical location. The 24/7 availability of quality-assured information, tailored to the specific situation, had a reassuring effect on our participants. It was consistent across concepts that Rosa was a tool for preparation and repetition; however, none of the participants (0/16) supported that Rosa could replace genetic counseling if hereditary cancer was confirmed. This indicates that a chatbot can be a well-suited digital companion to genetic counseling. UR - https://www.jmir.org/2023/1/e46571 UR - http://dx.doi.org/10.2196/46571 UR - http://www.ncbi.nlm.nih.gov/pubmed/37656502 ID - info:doi/10.2196/46571 ER - TY - JOUR AU - Májovský, Martin AU - Mikolov, Tomas AU - Netuka, David PY - 2023/8/31 TI - AI Is Changing the Landscape of Academic Writing: What Can Be Done? Authors? Reply to: AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on ?Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora?s Box Has Been Opened? JO - J Med Internet Res SP - e50844 VL - 25 KW - artificial intelligence KW - AI KW - publications KW - ethics KW - neurosurgery KW - ChatGPT KW - Chat Generative Pre-trained Transformer KW - language models KW - fraudulent medical articles UR - https://www.jmir.org/2023/1/e50844 UR - http://dx.doi.org/10.2196/50844 UR - http://www.ncbi.nlm.nih.gov/pubmed/37651175 ID - info:doi/10.2196/50844 ER - TY - JOUR AU - Liu, Nicholas AU - Brown, Amy PY - 2023/8/31 TI - AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on ?Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora?s Box Has Been Opened? JO - J Med Internet Res SP - e50591 VL - 25 KW - artificial intelligence KW - AI KW - publications KW - ethics KW - neurosurgery KW - ChatGPT KW - Chat Generative Pre-trained Transformer KW - language models KW - fraudulent medical articles UR - https://www.jmir.org/2023/1/e50591 UR - http://dx.doi.org/10.2196/50591 UR - http://www.ncbi.nlm.nih.gov/pubmed/37651167 ID - info:doi/10.2196/50591 ER - TY - JOUR AU - Leung, I. Tiffany AU - de Azevedo Cardoso, Taiane AU - Mavragani, Amaryllis AU - Eysenbach, Gunther PY - 2023/8/31 TI - Best Practices for Using AI Tools as an Author, Peer Reviewer, or Editor JO - J Med Internet Res SP - e51584 VL - 25 KW - publishing KW - open access publishing KW - open science KW - publication policy KW - science editing KW - scholarly publishing KW - scientific publishing KW - research KW - scientific research KW - editorial KW - artificial intelligence KW - AI UR - https://www.jmir.org/2023/1/e51584 UR - http://dx.doi.org/10.2196/51584 UR - http://www.ncbi.nlm.nih.gov/pubmed/37651164 ID - info:doi/10.2196/51584 ER - TY - JOUR AU - Leung, I. Tiffany AU - Sagar, Ankita AU - Shroff, Swati AU - Henry, L. Tracey PY - 2023/8/23 TI - Can AI Mitigate Bias in Writing Letters of Recommendation? JO - JMIR Med Educ SP - e51494 VL - 9 KW - sponsorship KW - implicit bias KW - gender bias KW - bias KW - letters of recommendation KW - artificial intelligence KW - large language models KW - medical education KW - career advancement KW - tenure and promotion KW - promotion KW - leadership UR - https://mededu.jmir.org/2023/1/e51494 UR - http://dx.doi.org/10.2196/51494 UR - http://www.ncbi.nlm.nih.gov/pubmed/37610808 ID - info:doi/10.2196/51494 ER - TY - JOUR AU - Rao, Arya AU - Pang, Michael AU - Kim, John AU - Kamineni, Meghana AU - Lie, Winston AU - Prasad, K. Anoop AU - Landman, Adam AU - Dreyer, Keith AU - Succi, D. Marc PY - 2023/8/22 TI - Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study JO - J Med Internet Res SP - e48659 VL - 25 KW - large language models KW - LLMs KW - artificial intelligence KW - AI KW - clinical decision support KW - clinical vignettes KW - ChatGPT KW - Generative Pre-trained Transformer KW - GPT KW - utility KW - development KW - usability KW - chatbot KW - accuracy KW - decision-making N2 - Background: Large language model (LLM)?based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. Objective: This study aimed to evaluate ChatGPT?s capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. Methods: We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT?s performance on clinical tasks. Results: ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (?=?15.8%; P<.001) and clinical management (?=?7.4%; P=.02) question types. Conclusions: ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT?s training data set. UR - https://www.jmir.org/2023/1/e48659 UR - http://dx.doi.org/10.2196/48659 UR - http://www.ncbi.nlm.nih.gov/pubmed/37606976 ID - info:doi/10.2196/48659 ER - TY - JOUR AU - Hsu, Hsing-Yu AU - Hsu, Kai-Cheng AU - Hou, Shih-Yen AU - Wu, Ching-Lung AU - Hsieh, Yow-Wen AU - Cheng, Yih-Dih PY - 2023/8/21 TI - Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation JO - JMIR Med Educ SP - e48433 VL - 9 KW - ChatGPT KW - large language model KW - natural language processing KW - real-world medication consultation questions KW - NLP KW - drug-herb interactions KW - pharmacist KW - LLM KW - language models KW - chat generative pre-trained transformer N2 - Background: Since OpenAI released ChatGPT, with its strong capability in handling natural tasks and its user-friendly interface, it has garnered significant attention. Objective: A prospective analysis is required to evaluate the accuracy and appropriateness of medication consultation responses generated by ChatGPT. Methods: A prospective cross-sectional study was conducted by the pharmacy department of a medical center in Taiwan. The test data set comprised retrospective medication consultation questions collected from February 1, 2023, to February 28, 2023, along with common questions about drug-herb interactions. Two distinct sets of questions were tested: real-world medication consultation questions and common questions about interactions between traditional Chinese and Western medicines. We used the conventional double-review mechanism. The appropriateness of each response from ChatGPT was assessed by 2 experienced pharmacists. In the event of a discrepancy between the assessments, a third pharmacist stepped in to make the final decision. Results: Of 293 real-world medication consultation questions, a random selection of 80 was used to evaluate ChatGPT?s performance. ChatGPT exhibited a higher appropriateness rate in responding to public medication consultation questions compared to those asked by health care providers in a hospital setting (31/51, 61% vs 20/51, 39%; P=.01). Conclusions: The findings from this study suggest that ChatGPT could potentially be used for answering basic medication consultation questions. Our analysis of the erroneous information allowed us to identify potential medical risks associated with certain questions; this problem deserves our close attention. UR - https://mededu.jmir.org/2023/1/e48433 UR - http://dx.doi.org/10.2196/48433 UR - http://www.ncbi.nlm.nih.gov/pubmed/37561097 ID - info:doi/10.2196/48433 ER - TY - JOUR AU - Lee, Hyeonhoon PY - 2023/8/17 TI - Using ChatGPT as a Learning Tool in Acupuncture Education: Comparative Study JO - JMIR Med Educ SP - e47427 VL - 9 KW - ChatGPT KW - educational tool KW - artificial intelligence KW - acupuncture KW - AI KW - personalized education KW - students N2 - Background: ChatGPT (Open AI) is a state-of-the-art artificial intelligence model with potential applications in the medical fields of clinical practice, research, and education. Objective: This study aimed to evaluate the potential of ChatGPT as an educational tool in college acupuncture programs, focusing on its ability to support students in learning acupuncture point selection, treatment planning, and decision-making. Methods: We collected case studies published in Acupuncture in Medicine between June 2022 and May 2023. Both ChatGPT-3.5 and ChatGPT-4 were used to generate suggestions for acupuncture points based on case presentations. A Wilcoxon signed-rank test was conducted to compare the number of acupuncture points generated by ChatGPT-3.5 and ChatGPT-4, and the overlapping ratio of acupuncture points was calculated. Results: Among the 21 case studies, 14 studies were included for analysis. ChatGPT-4 generated significantly more acupuncture points (9.0, SD 1.1) compared to ChatGPT-3.5 (5.6, SD 0.6; P<.001). The overlapping ratios of acupuncture points for ChatGPT-3.5 (0.40, SD 0.28) and ChatGPT-4 (0.34, SD 0.27; P=.67) were not significantly different. Conclusions: ChatGPT may be a useful educational tool for acupuncture students, providing valuable insights into personalized treatment plans. However, it cannot fully replace traditional diagnostic methods, and further studies are needed to ensure its safe and effective implementation in acupuncture education. UR - https://mededu.jmir.org/2023/1/e47427 UR - http://dx.doi.org/10.2196/47427 UR - http://www.ncbi.nlm.nih.gov/pubmed/37590034 ID - info:doi/10.2196/47427 ER - TY - JOUR AU - Au, Jessica AU - Falloon, Caitlin AU - Ravi, Ayngaran AU - Ha, Phil AU - Le, Suong PY - 2023/8/15 TI - A Beta-Prototype Chatbot for Increasing Health Literacy of Patients With Decompensated Cirrhosis: Usability Study JO - JMIR Hum Factors SP - e42506 VL - 10 KW - chronic liver disease KW - chatbot KW - artificial intelligence KW - health literacy KW - acceptability N2 - Background: Health literacy is low among patients with chronic liver disease (CLD) and associated with poor health outcomes and increased health care use. Lucy LiverBot, an artificial intelligence chatbot was created by a multidisciplinary team at Monash Health, Australia, to improve health literacy and self-efficacy in patients with decompensated CLD. Objective: The aim of this study was to explore users? experience with Lucy LiverBot using an unmoderated, in-person, qualitative test. Methods: Lucy LiverBot is a simple, low cost, and scalable digital intervention, which was at the beta prototype development phase at the time of usability testing. The concept and prototype development was realized in 2 phases: concept development and usability testing. We conducted a mixed methods study to assess usability of Lucy LiverBot as a tool for health literacy education among ambulatory and hospitalized patients with decompensated CLD at Monash Health. Patients were provided with free reign to interact with Lucy LiverBot on an iPad device under moderator observation. A 3-part survey (preuser, user, and postuser) was developed using the Unified Acceptance Theory Framework to capture the user experience. Results: There were 20 participants with a median age of 55.5 (IQR 46.0-60.5) years, 55% (n=11) of them were female, and 85% (n=17) of them were White. In total, 35% (n=7) of them reported having difficulty reading and understanding written medical information. Alcohol was the predominant etiology in 70% (n=14) of users. Participants actively engaged with Lucy LiverBot and identified it as a potential educational tool and device that could act as a social companion to improve well-being. In total, 25% (n=5) of them reported finding it difficult to learn about their health problems and 20% (n=4) of them found it difficult to find medical information they could trust. Qualitative interviews revealed the conversational nature of Lucy LiverBot was considered highly appealing with improvement in mental health and well-being reported as an unintended benefit of Lucy LiverBot. Patients who had been managing their liver cirrhosis for several years identified that they would be less likely to use Lucy LiverBot, but that it would have been more useful at the time of their diagnosis. Overall, Lucy LiverBot was perceived as a reliable and trustworthy source of information. Conclusions: Lucy LiverBot was well received and may be used to improve health literacy and address barriers to health care provision in patients with decompensated CLD. The study revealed important feedback that has been used to further optimize Lucy LiverBot. Further acceptability and validation studies are being undertaken to investigate whether Lucy LiverBot can improve clinical outcomes and health related quality of life in patients with decompensated CLD. UR - https://humanfactors.jmir.org/2023/1/e42506 UR - http://dx.doi.org/10.2196/42506 UR - http://www.ncbi.nlm.nih.gov/pubmed/37581920 ID - info:doi/10.2196/42506 ER - TY - JOUR AU - Safranek, W. Conrad AU - Sidamon-Eristoff, Elizabeth Anne AU - Gilson, Aidan AU - Chartash, David PY - 2023/8/14 TI - The Role of Large Language Models in Medical Education: Applications and Implications JO - JMIR Med Educ SP - e50945 VL - 9 KW - large language models KW - ChatGPT KW - medical education KW - LLM KW - artificial intelligence in health care KW - AI KW - autoethnography UR - https://mededu.jmir.org/2023/1/e50945 UR - http://dx.doi.org/10.2196/50945 UR - http://www.ncbi.nlm.nih.gov/pubmed/37578830 ID - info:doi/10.2196/50945 ER - TY - JOUR AU - Sharp, Gemma AU - Torous, John AU - West, L. Madeline PY - 2023/8/14 TI - Ethical Challenges in AI Approaches to Eating Disorders JO - J Med Internet Res SP - e50696 VL - 25 KW - eating disorders KW - body image KW - artificial intelligence KW - AI KW - chatbot KW - ethics UR - https://www.jmir.org/2023/1/e50696 UR - http://dx.doi.org/10.2196/50696 UR - http://www.ncbi.nlm.nih.gov/pubmed/37578836 ID - info:doi/10.2196/50696 ER - TY - JOUR AU - Shao, Chen-ye AU - Li, Hui AU - Liu, Xiao-long AU - Li, Chang AU - Yang, Li-qin AU - Zhang, Yue-juan AU - Luo, Jing AU - Zhao, Jun PY - 2023/8/14 TI - Appropriateness and Comprehensiveness of Using ChatGPT for Perioperative Patient Education in Thoracic Surgery in Different Language Contexts: Survey Study JO - Interact J Med Res SP - e46900 VL - 12 KW - patient education KW - ChatGPT KW - Generative Pre-trained Transformer KW - thoracic surgery KW - evaluation KW - patient KW - education KW - surgery KW - thoracic KW - language KW - language model KW - clinical workflow KW - artificial intelligence KW - AI KW - workflow KW - communication KW - feasibility N2 - Background: ChatGPT, a dialogue-based artificial intelligence language model, has shown promise in assisting clinical workflows and patient-clinician communication. However, there is a lack of feasibility assessments regarding its use for perioperative patient education in thoracic surgery. Objective: This study aimed to assess the appropriateness and comprehensiveness of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. Methods: This pilot study was conducted in February 2023. A total of 37 questions focused on perioperative patient education in thoracic surgery were created based on guidelines and clinical experience. Two sets of inquiries were made to ChatGPT for each question, one in English and the other in Chinese. The responses generated by ChatGPT were evaluated separately by experienced thoracic surgical clinicians for appropriateness and comprehensiveness based on a hypothetical draft response to a patient?s question on the electronic information platform. For a response to be qualified, it required at least 80% of reviewers to deem it appropriate and 50% to deem it comprehensive. Statistical analyses were performed using the unpaired chi-square test or Fisher exact test, with a significance level set at P<.05. Results: The set of 37 commonly asked questions covered topics such as disease information, diagnostic procedures, perioperative complications, treatment measures, disease prevention, and perioperative care considerations. In both the English and Chinese contexts, 34 (92%) out of 37 responses were qualified in terms of both appropriateness and comprehensiveness. The remaining 3 (8%) responses were unqualified in these 2 contexts. The unqualified responses primarily involved the diagnosis of disease symptoms and surgical-related complications symptoms. The reasons for determining the responses as unqualified were similar in both contexts. There was no statistically significant difference (34/37, 92% vs 34/37, 92%; P=.99) in the qualification rate between the 2 language sets. Conclusions: This pilot study demonstrates the potential feasibility of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. ChatGPT is expected to enhance patient satisfaction, reduce anxiety, and improve compliance during the perioperative period. In the future, there will be remarkable potential application for using artificial intelligence, in conjunction with human review, for patient education and health consultation after patients have provided their informed consent. UR - https://www.i-jmr.org/2023/1/e46900 UR - http://dx.doi.org/10.2196/46900 UR - http://www.ncbi.nlm.nih.gov/pubmed/37578819 ID - info:doi/10.2196/46900 ER - TY - JOUR AU - Wang, Changyu AU - Liu, Siru AU - Yang, Hao AU - Guo, Jiulin AU - Wu, Yuxuan AU - Liu, Jialin PY - 2023/8/11 TI - Ethical Considerations of Using ChatGPT in Health Care JO - J Med Internet Res SP - e48009 VL - 25 KW - ethics KW - ChatGPT KW - artificial intelligence KW - AI KW - large language models KW - health care KW - artificial intelligence development KW - development KW - algorithm KW - patient safety KW - patient privacy KW - safety KW - privacy UR - https://www.jmir.org/2023/1/e48009 UR - http://dx.doi.org/10.2196/48009 UR - http://www.ncbi.nlm.nih.gov/pubmed/37566454 ID - info:doi/10.2196/48009 ER - TY - JOUR AU - Li, Qingchuan AU - Luximon, Yan AU - Zhang, Jiaxin PY - 2023/8/10 TI - The Influence of Anthropomorphic Cues on Patients? Perceived Anthropomorphism, Social Presence, Trust Building, and Acceptance of Health Care Conversational Agents: Within-Subject Web-Based Experiment JO - J Med Internet Res SP - e44479 VL - 25 KW - anthropomorphic cues KW - intelligent guidance conversational agents KW - social presence KW - trust KW - technology acceptance KW - mindful and mindless anthropomorphism N2 - Background: The last decade has witnessed the rapid development of health care conversational agents (CAs); however, there are still great challenges in making health care CAs trustworthy and acceptable to patients. Objective: Focusing on intelligent guidance CAs, a type of health care CA for web-based patient triage, this study aims to investigate how anthropomorphic cues influence patients? perceived anthropomorphism and social presence of such CAs and evaluate how these perceptions facilitate their trust-building process and acceptance behavior. Methods: To test the research hypotheses, the video vignette methodology was used to evaluate patients? perceptions and acceptance of various intelligent guidance CAs. The anthropomorphic cues of CAs were manipulated in a 3×2 within-subject factorial experiment with 103 participants, with the factors of agent appearance (high, medium, and low anthropomorphic levels) and verbal cues (humanlike and machine-like verbal cues) as the within-subject variables. Results: The 2-way repeated measures ANOVA analysis indicated that the higher anthropomorphic level of agent appearance significantly increased mindful anthropomorphism (high level>medium level: 4.57 vs 4.27; P=.01; high level>low level: 4.57 vs 4.04; P<.001; medium level>low level: 4.27 vs 4.04; P=.04), mindless anthropomorphism (high level>medium level: 5.39 vs 5.01; P<.001; high level>low level: 5.39 vs 4.85; P<.001), and social presence (high level>medium level: 5.19 vs 4.83; P<.001; high level>low level: 5.19 vs 4.72; P<.001), and the higher anthropomorphic level of verbal cues significantly increased mindful anthropomorphism (4.83 vs 3.76; P<.001), mindless anthropomorphism (5.60 vs 4.57; P<.001), and social presence (5.41 vs 4.41; P<.001). Meanwhile, a significant interaction between agent appearance and verbal cues (.004) was revealed. Second, the partial least squares results indicated that privacy concerns were negatively influenced by social presence (?=?.375; t312=4.494) and mindful anthropomorphism (?=?.112; t312=1.970). Privacy concerns (?=?.273; t312=9.558), social presence (?=.265; t312=4.314), and mindless anthropomorphism (?=.405; t312=7.145) predicted the trust in CAs, which further promoted the intention to disclose information (?=.675; t312=21.163), the intention to continuously use CAs (?=.190; t312=4.874), and satisfaction (?=.818; t312=46.783). Conclusions: The findings show that a high anthropomorphic level of agent appearance and verbal cues could improve the perceptions of mindful anthropomorphism and mindless anthropomorphism as well as social presence. Furthermore, mindless anthropomorphism and social presence significantly promoted patients? trust in CAs, and mindful anthropomorphism and social presence decreased privacy concerns. It is also worth noting that trust was an important antecedent and determinant of patients? acceptance of CAs, including their satisfaction, intention to disclose information, and intention to continuously use CAs. UR - https://www.jmir.org/2023/1/e44479 UR - http://dx.doi.org/10.2196/44479 UR - http://www.ncbi.nlm.nih.gov/pubmed/37561567 ID - info:doi/10.2196/44479 ER - TY - JOUR AU - Mills, Rhiana AU - Mangone, Rose Emily AU - Lesh, Neal AU - Mohan, Diwakar AU - Baraitser, Paula PY - 2023/8/9 TI - Chatbots to Improve Sexual and Reproductive Health: Realist Synthesis JO - J Med Internet Res SP - e46761 VL - 25 KW - chatbot KW - sexual and reproductive health KW - realist synthesis KW - social networks KW - service networks KW - disclosure KW - artificial intelligence KW - sexual KW - reproductive KW - social media KW - counseling KW - treatment KW - development KW - theory KW - digital device KW - device N2 - Background: Digital technologies may improve sexual and reproductive health (SRH) across diverse settings. Chatbots are computer programs designed to simulate human conversation, and there is a growing interest in the potential for chatbots to provide responsive and accurate information, counseling, linkages to products and services, or a companion on an SRH journey. Objective: This review aimed to identify assumptions about the value of chatbots for SRH and collate the evidence to support them. Methods: We used a realist approach that starts with an initial program theory and generates causal explanations in the form of context, mechanism, and outcome configurations to test and develop that theory. We generated our program theory, drawing on the expertise of the research team, and then searched the literature to add depth and develop this theory with evidence. Results: The evidence supports our program theory, which suggests that chatbots are a promising intervention for SRH information and service delivery. This is because chatbots offer anonymous and nonjudgmental interactions that encourage disclosure of personal information, provide complex information in a responsive and conversational tone that increases understanding, link to SRH conversations within web-based and offline social networks, provide immediate support or service provision 24/7 by automating some tasks, and provide the potential to develop long-term relationships with users who return over time. However, chatbots may be less valuable where people find any conversation about SRH (even with a chatbot) stigmatizing, for those who lack confidential access to digital devices, where conversations do not feel natural, and where chatbots are developed as stand-alone interventions without reference to service contexts. Conclusions: Chatbots in SRH could be developed further to automate simple tasks and support service delivery. They should prioritize achieving an authentic conversational tone, which could be developed to facilitate content sharing in social networks, should support long-term relationship building with their users, and should be integrated into wider service networks. UR - https://www.jmir.org/2023/1/e46761 UR - http://dx.doi.org/10.2196/46761 UR - http://www.ncbi.nlm.nih.gov/pubmed/37556194 ID - info:doi/10.2196/46761 ER - TY - JOUR AU - Avramovi?, Petra AU - Rietdijk, Rachael AU - Kenny, Belinda AU - Power, Emma AU - Togher, Leanne PY - 2023/8/9 TI - Developing a Digital Health Intervention for Conversation Skills After Brain Injury (convers-ABI-lity) Using a Collaborative Approach: Mixed Methods Study JO - J Med Internet Res SP - e45240 VL - 25 KW - brain injury KW - cognitive-communication KW - communication partner training KW - digital health KW - co-design N2 - Background: People with acquired brain injury (ABI) experience communication breakdown in everyday interactions many years after injury, negatively impacting social and vocational relationships. Communication partner training (CPT) is a recommended intervention approach in communication rehabilitation after ABI. Access to long-term services is essential, both in rural and remote locations. Digital health has potential to overcome the challenges of travel and improve cost efficiencies, processes, and clinical outcomes. Objective: We aimed to collaboratively develop a novel, multimodal web-based CPT intervention (convers-ABI-lity) with key stakeholders and evaluate its feasibility for improving conversation skills after brain injury. Methods: This mixed methods study consisted of 3 key stages guided by the Integrate, Design, Assess, and Share (IDEAS) framework for developing effective digital health interventions. Stage 1 included the integration of current end-user needs and perspectives with key treatment and theoretical components of existing evidence-based interventions, TBI Express and TBIconneCT. Stage 2 included the iterative design of convers-ABI-lity with feedback from end-user interviews (n=22) analyzed using content analysis. Participants were individuals with ABI, family members, health professionals, and paid support workers. Stage 3 included the evaluation of the feasibility through a proof-of-concept study (n=3). A total of 3 dyads (a person with ABI and their communication partner [CP]) completed 7 weeks of convers-ABI-lity, guided by a clinician. The outcome measures included blinded ratings of conversation samples and self-report measures. We analyzed postintervention participant interviews using content analysis to inform further intervention refinement and development. Results: Collaborative and iterative design and development during stages 1 and 2 resulted in the development of convers-ABI-lity. Results in stage 3 indicated positive changes in the blinded ratings of conversation samples for the participants with traumatic brain injury and their CPs. Statistically reliable positive changes were also observed in the self-report measures of social communication skills and quality of life. Intervention participants endorsed aspects of convers-ABI-lity, such as its complementary nature, self-guided web-based modules, clinician sessions, engaging content, and novel features. They reported the intervention to be relevant to their personal experience with cognitive-communication disorders. Conclusions: This study presents the outcome of using the IDEAS framework to guide the development of a web-based multimodal CPT intervention with input from key stakeholders. The results indicate promising outcomes for improving the conversation skills of people with ABI and their CPs. Further evaluation of intervention effectiveness and efficacy using a larger sample size is required. UR - https://www.jmir.org/2023/1/e45240 UR - http://dx.doi.org/10.2196/45240 UR - http://www.ncbi.nlm.nih.gov/pubmed/37556179 ID - info:doi/10.2196/45240 ER - TY - JOUR AU - Borchert, J. Robin AU - Hickman, R. Charlotte AU - Pepys, Jack AU - Sadler, J. Timothy PY - 2023/8/7 TI - Performance of ChatGPT on the Situational Judgement Test?A Professional Dilemmas?Based Examination for Doctors in the United Kingdom JO - JMIR Med Educ SP - e48978 VL - 9 KW - ChatGPT KW - language models KW - Situational Judgement Test KW - medical education KW - artificial intelligence KW - language model KW - exam KW - examination KW - SJT KW - judgement KW - reasoning KW - communication KW - chatbot N2 - Background: ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors. Objective: We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics. Methods: All questions from the UK Foundation Programme Office?s (UKFPO?s) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT?s answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated. Results: Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT?s situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors. Conclusions: Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics. UR - https://mededu.jmir.org/2023/1/e48978 UR - http://dx.doi.org/10.2196/48978 UR - http://www.ncbi.nlm.nih.gov/pubmed/37548997 ID - info:doi/10.2196/48978 ER - TY - JOUR AU - Viduani, Anna AU - Cosenza, Victor AU - Fisher, L. Helen AU - Buchweitz, Claudia AU - Piccin, Jader AU - Pereira, Rivka AU - Kohrt, A. Brandon AU - Mondelli, Valeria AU - van Heerden, Alastair AU - Araújo, Matsumura Ricardo AU - Kieling, Christian PY - 2023/8/7 TI - Assessing Mood With the Identifying Depression Early in Adolescence Chatbot (IDEABot): Development and Implementation Study JO - JMIR Hum Factors SP - e44388 VL - 10 KW - depression KW - adolescent KW - ambulatory assessment KW - chatbot KW - smartphone KW - digital mental health KW - mobile phone N2 - Background: Mental health status assessment is mostly limited to clinical or research settings, but recent technological advances provide new opportunities for measurement using more ecological approaches. Leveraging apps already in use by individuals on their smartphones, such as chatbots, could be a useful approach to capture subjective reports of mood in the moment. Objective: This study aimed to describe the development and implementation of the Identifying Depression Early in Adolescence Chatbot (IDEABot), a WhatsApp-based tool designed for collecting intensive longitudinal data on adolescents? mood. Methods: The IDEABot was developed to collect data from Brazilian adolescents via WhatsApp as part of the Identifying Depression Early in Adolescence Risk Stratified Cohort (IDEA-RiSCo) study. It supports the administration and collection of self-reported structured items or questionnaires and audio responses. The development explored WhatsApp?s default features, such as emojis and recorded audio messages, and focused on scripting relevant and acceptable conversations. The IDEABot supports 5 types of interactions: textual and audio questions, administration of a version of the Short Mood and Feelings Questionnaire, unprompted interactions, and a snooze function. Six adolescents (n=4, 67% male participants and n=2, 33% female participants) aged 16 to 18 years tested the initial version of the IDEABot and were engaged to codevelop the final version of the app. The IDEABot was subsequently used for data collection in the second- and third-year follow-ups of the IDEA-RiSCo study. Results: The adolescents assessed the initial version of the IDEABot as enjoyable and made suggestions for improvements that were subsequently implemented. The IDEABot?s final version follows a structured script with the choice of answer based on exact text matches throughout 15 days. The implementation of the IDEABot in 2 waves of the IDEA-RiSCo sample (140 and 132 eligible adolescents in the second- and third-year follow-ups, respectively) evidenced adequate engagement indicators, with good acceptance for using the tool (113/140, 80.7% and 122/132, 92.4% for second- and third-year follow-up use, respectively), low attrition (only 1/113, 0.9% and 1/122, 0.8%, respectively, failed to engage in the protocol after initial interaction), and high compliance in terms of the proportion of responses in relation to the total number of elicited prompts (12.8, SD 3.5; 91% out of 14 possible interactions and 10.57, SD 3.4; 76% out of 14 possible interactions, respectively). Conclusions: The IDEABot is a frugal app that leverages an existing app already in daily use by our target population. It follows a simple rule-based approach that can be easily tested and implemented in diverse settings and possibly diminishes the burden of intensive data collection for participants by repurposing WhatsApp. In this context, the IDEABot appears as an acceptable and potentially scalable tool for gathering momentary information that can enhance our understanding of mood fluctuations and development. UR - https://humanfactors.jmir.org/2023/1/e44388 UR - http://dx.doi.org/10.2196/44388 UR - http://www.ncbi.nlm.nih.gov/pubmed/37548996 ID - info:doi/10.2196/44388 ER - TY - JOUR AU - Pillai, Malvika AU - Griffin, C. Ashley AU - Kronk, A. Clair AU - McCall, Terika PY - 2023/8/4 TI - Toward Community-Based Natural Language Processing (CBNLP): Cocreating With Communities JO - J Med Internet Res SP - e48498 VL - 25 KW - ChatGPT KW - natural language processing KW - community-based participatory research KW - research design KW - artificial intelligence KW - participatory KW - co-design KW - machine learning KW - co-creation KW - community based KW - lived experience KW - lived experiences KW - collaboration KW - collaborative UR - https://www.jmir.org/2023/1/e48498 UR - http://dx.doi.org/10.2196/48498 UR - http://www.ncbi.nlm.nih.gov/pubmed/37540551 ID - info:doi/10.2196/48498 ER - TY - JOUR AU - Stewart, Ian AU - Welch, Charles AU - An, Lawrence AU - Resnicow, Ken AU - Pennebaker, James AU - Mihalcea, Rada PY - 2023/8/1 TI - Expressive Interviewing Agents to Support Health-Related Behavior Change: Randomized Controlled Study of COVID-19 Behaviors JO - JMIR Form Res SP - e40277 VL - 7 KW - expressive writing KW - motivational interviewing KW - dialogue systems KW - counseling KW - behavior change KW - text analysis KW - COVID-19 KW - mental health KW - automated writing KW - writing system KW - stress KW - psychological health N2 - Background: Expressive writing and motivational interviewing are well-known approaches to help patients cope with stressful life events. Although these methods are often applied by human counselors, it is less well understood if an automated artificial intelligence approach can benefit patients. Providing an automated method would help expose a wider range of people to the possible benefits of motivational interviewing, with lower cost and more adaptability to sudden events like the COVID-19 pandemic. Objective: This study presents an automated writing system and evaluates possible outcomes among participants with respect to behavior related to the COVID-19 pandemic. Methods: We developed a rule-based dialogue system for ?Expressive Interviewing? to elicit writing from participants on the subject of how COVID-19 has impacted their lives. The system prompts participants to describe their life experiences and emotions and provides topic-specific prompts in response to participants? use of topical keywords. In May 2021 and June 2021, we recruited participants (N=151) via Prolific to complete either the Expressive Interviewing task or a control task. We surveyed participants immediately before the intervention, immediately after the intervention, and again 2 weeks after the intervention. We measured participants? self-reported stress, general mental health, COVID-19?related health behavior, and social behavior. Results: Participants generally wrote long responses during the task (53.3 words per response). In aggregate, task participants experienced a significant decrease in stress in the short term (~23% decrease, P<.001) and a slight difference in social activity compared with the control group (P=.03). No significant differences in short-term or long-term outcomes were detected between participant subgroups (eg, male versus female participants) except for some within-condition differences by ethnicity (eg, higher social activity among African American people participating in Expressive Interviewing vs participants of other ethnicities). For short-term effects, participants showed different outcomes based on their writing. Using more anxiety-related words was correlated with a greater short-term decrease in stress (r=?0.264, P<.001), and using more positive emotion words was correlated with a more meaningful experience (r=0.243, P=.001). As for long-term effects, writing with more lexical diversity was correlated with an increase in social activity (r=0.266, P<.001). Conclusions: Expressive Interviewing participants exhibited short-term, but not long-term, positive changes in mental health, and some linguistic metrics of writing style were correlated with positive change in behavior. Although there were no significant long-term effects observed, the positive short-term effects suggest that the Expressive Interviewing intervention could be used in cases in which a patient lacks access to traditional therapy and needs a short-term solution. Trial Registration: Clincaltrials.gov NCT05949840; https://www.clinicaltrials.gov/study/NCT05949840 UR - https://formative.jmir.org/2023/1/e40277 UR - http://dx.doi.org/10.2196/40277 UR - http://www.ncbi.nlm.nih.gov/pubmed/37074948 ID - info:doi/10.2196/40277 ER - TY - JOUR AU - Fournier-Tombs, Eleonore AU - McHardy, Juliette PY - 2023/7/26 TI - A Medical Ethics Framework for Conversational Artificial Intelligence JO - J Med Internet Res SP - e43068 VL - 25 KW - chatbot KW - medicine KW - ethics KW - AI ethics KW - AI policy KW - conversational agent KW - COVID-19 KW - risk KW - medical ethics KW - privacy KW - data governance KW - artificial intelligence UR - https://www.jmir.org/2023/1/e43068 UR - http://dx.doi.org/10.2196/43068 UR - http://www.ncbi.nlm.nih.gov/pubmed/37224277 ID - info:doi/10.2196/43068 ER - TY - JOUR AU - Hristidis, Vagelis AU - Ruggiano, Nicole AU - Brown, L. Ellen AU - Ganta, Reddy Sai Rithesh AU - Stewart, Selena PY - 2023/7/25 TI - ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results JO - J Med Internet Res SP - e48966 VL - 25 KW - chatbots KW - large language models KW - ChatGPT KW - web search KW - language model KW - Google KW - aging KW - cognitive KW - cognition KW - dementia KW - gerontology KW - geriatric KW - geriatrics KW - query KW - queries KW - information seeking KW - search N2 - Background: People living with dementia or other cognitive decline and their caregivers (PLWD) increasingly rely on the web to find information about their condition and available resources and services. The recent advancements in large language models (LLMs), such as ChatGPT, provide a new alternative to the more traditional web search engines, such as Google. Objective: This study compared the quality of the results of ChatGPT and Google for a collection of PLWD-related queries. Methods: A set of 30 informational and 30 service delivery (transactional) PLWD-related queries were selected and submitted to both Google and ChatGPT. Three domain experts assessed the results for their currency of information, reliability of the source, objectivity, relevance to the query, and similarity of their response. The readability of the results was also analyzed. Interrater reliability coefficients were calculated for all outcomes. Results: Google had superior currency and higher reliability. ChatGPT results were evaluated as more objective. ChatGPT had a significantly higher response relevance, while Google often drew upon sources that were referral services for dementia care or service providers themselves. The readability was low for both platforms, especially for ChatGPT (mean grade level 12.17, SD 1.94) compared to Google (mean grade level 9.86, SD 3.47). The similarity between the content of ChatGPT and Google responses was rated as high for 13 (21.7%) responses, medium for 16 (26.7%) responses, and low for 31 (51.6%) responses. Conclusions: Both Google and ChatGPT have strengths and weaknesses. ChatGPT rarely includes the source of a result. Google more often provides a date for and a known reliable source of the response compared to ChatGPT, whereas ChatGPT supplies more relevant responses to queries. The results of ChatGPT may be out of date and often do not specify a validity time stamp. Google sometimes returns results based on commercial entities. The readability scores for both indicate that responses are often not appropriate for persons with low health literacy skills. In the future, the addition of both the source and the date of health-related information and availability in other languages may increase the value of these platforms for both nonmedical and medical professionals. UR - https://www.jmir.org/2023/1/e48966 UR - http://dx.doi.org/10.2196/48966 UR - http://www.ncbi.nlm.nih.gov/pubmed/37490317 ID - info:doi/10.2196/48966 ER - TY - JOUR AU - Lin, Xiaowen AU - Martinengo, Laura AU - Jabir, Ishqi Ahmad AU - Ho, Yan Andy Hau AU - Car, Josip AU - Atun, Rifat AU - Tudor Car, Lorainne PY - 2023/7/18 TI - Scope, Characteristics, Behavior Change Techniques, and Quality of Conversational Agents for Mental Health and Well-Being: Systematic Assessment of Apps JO - J Med Internet Res SP - e45984 VL - 25 KW - conversational agent KW - chatbot KW - mental health KW - mobile health KW - mHealth KW - behavior change KW - apps KW - Mobile Application Rating Scale KW - MARS KW - mobile phone N2 - Background: Mental disorders cause substantial health-related burden worldwide. Mobile health interventions are increasingly being used to promote mental health and well-being, as they could improve access to treatment and reduce associated costs. Behavior change is an important feature of interventions aimed at improving mental health and well-being. There is a need to discern the active components that can promote behavior change in such interventions and ultimately improve users? mental health. Objective: This study systematically identified mental health conversational agents (CAs) currently available in app stores and assessed the behavior change techniques (BCTs) used. We further described their main features, technical aspects, and quality in terms of engagement, functionality, esthetics, and information using the Mobile Application Rating Scale. Methods: The search, selection, and assessment of apps were adapted from a systematic review methodology and included a search, 2 rounds of selection, and an evaluation following predefined criteria. We conducted a systematic app search of Apple?s App Store and Google Play using 42matters. Apps with CAs in English that uploaded or updated from January 2020 and provided interventions aimed at improving mental health and well-being and the assessment or management of mental disorders were tested by at least 2 reviewers. The BCT taxonomy v1, a comprehensive list of 93 BCTs, was used to identify the specific behavior change components in CAs. Results: We found 18 app-based mental health CAs. Most CAs had <1000 user ratings on both app stores (12/18, 67%) and targeted several conditions such as stress, anxiety, and depression (13/18, 72%). All CAs addressed >1 mental disorder. Most CAs (14/18, 78%) used cognitive behavioral therapy (CBT). Half (9/18, 50%) of the CAs identified were rule based (ie, only offered predetermined answers) and the other half (9/18, 50%) were artificial intelligence enhanced (ie, included open-ended questions). CAs used 48 different BCTs and included on average 15 (SD 8.77; range 4-30) BCTs. The most common BCTs were 3.3 ?Social support (emotional),? 4.1 ?Instructions for how to perform a behavior,? 11.2 ?Reduce negative emotions,? and 6.1 ?Demonstration of the behavior.? One-third (5/14, 36%) of the CAs claiming to be CBT based did not include core CBT concepts. Conclusions: Mental health CAs mostly targeted various mental health issues such as stress, anxiety, and depression, reflecting a broad intervention focus. The most common BCTs identified serve to promote the self-management of mental disorders with few therapeutic elements. CA developers should consider the quality of information, user confidentiality, access, and emergency management when designing mental health CAs. Future research should assess the role of artificial intelligence in promoting behavior change within CAs and determine the choice of BCTs in evidence-based psychotherapies to enable systematic, consistent, and transparent development and evaluation of effective digital mental health interventions. UR - https://www.jmir.org/2023/1/e45984 UR - http://dx.doi.org/10.2196/45984 UR - http://www.ncbi.nlm.nih.gov/pubmed/37463036 ID - info:doi/10.2196/45984 ER - TY - JOUR AU - Darien, Kaja AU - Lee, Susan AU - Knowles, Kayla AU - Wood, Sarah AU - Langer, D. Miriam AU - Lazar, Nellie AU - Dowshen, Nadia PY - 2023/7/18 TI - Health Information From Web Search Engines and Virtual Assistants About Pre-Exposure Prophylaxis for HIV Prevention in Adolescents and Young Adults: Content Analysis JO - JMIR Pediatr Parent SP - e41806 VL - 6 KW - pre-exposure prophylaxis KW - PrEP KW - prophylaxis KW - internet use KW - search engine KW - adolescent KW - youth KW - pediatric KW - adolescence KW - young adult KW - readability KW - human immunodeficiency virus KW - HIV KW - virtual assistant KW - health information KW - information quality KW - accuracy KW - credibility KW - patient education KW - comprehension KW - comprehensible KW - web-based KW - online information KW - sexual health KW - reading level N2 - Background: Adolescents and young adults are disproportionately affected by HIV, suggesting that HIV prevention methods such as pre-exposure prophylaxis (PrEP) should focus on this group as a priority. As digital natives, youth likely turn to internet resources regarding health topics they may not feel comfortable discussing with their medical providers. To optimize informed decision-making by adolescents and young adults most impacted by HIV, the information from internet searches should be educational, accurate, and readable. Objective: The aims of this study were to compare the accuracy of web-based PrEP information found using web search engines and virtual assistants, and to assess the readability of the resulting information. Methods: Adolescent HIV prevention clinical experts developed a list of 23 prevention-related questions that were posed to search engines (Ask.com, Bing, Google, and Yahoo) and virtual assistants (Amazon Alexa, Microsoft Cortana, Google Assistant, and Apple Siri). The first three results from search engines and virtual assistant web references, as well as virtual assistant verbal responses, were recorded and coded using a six-tier scale to assess the quality of information produced. The results were also entered in a web-based tool determining readability using the Flesch-Kincaid Grade Level scale. Results: Google web search engine and Google Assistant more frequently produced PrEP information of higher quality than the other search engines and virtual assistants with scores ranging from 3.4 to 3.7 and 2.8 to 3.3, respectively. Additionally, the resulting information generally was presented in language at a seventh and 10th grade reading level according to the Flesch-Kincaid Grade Level scale. Conclusions: Adolescents and young adults are large consumers of technology and may experience discomfort discussing their sexual health with providers. It is important that efforts are made to ensure the information they receive about HIV prevention methods, and PrEP in particular, is comprehensive, comprehensible, and widely available. UR - https://pediatrics.jmir.org/2023/1/e41806 UR - http://dx.doi.org/10.2196/41806 UR - http://www.ncbi.nlm.nih.gov/pubmed/37463044 ID - info:doi/10.2196/41806 ER - TY - JOUR AU - Gilson, Aidan AU - Safranek, W. Conrad AU - Huang, Thomas AU - Socrates, Vimig AU - Chi, Ling AU - Taylor, Andrew Richard AU - Chartash, David PY - 2023/7/13 TI - Authors? Reply to: Variability in Large Language Models? Responses to Medical Licensing and Certification Examinations JO - JMIR Med Educ SP - e50336 VL - 9 KW - natural language processing KW - NLP KW - MedQA KW - generative pre-trained transformer KW - GPT KW - medical education KW - chatbot KW - artificial intelligence KW - AI KW - education technology KW - ChatGPT KW - conversational agent KW - machine learning KW - large language models KW - knowledge assessment UR - https://mededu.jmir.org/2023/1/e50336 UR - http://dx.doi.org/10.2196/50336 UR - http://www.ncbi.nlm.nih.gov/pubmed/37440299 ID - info:doi/10.2196/50336 ER - TY - JOUR AU - Epstein, H. Richard AU - Dexter, Franklin PY - 2023/7/13 TI - Variability in Large Language Models? Responses to Medical Licensing and Certification Examinations. Comment on ?How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment? JO - JMIR Med Educ SP - e48305 VL - 9 KW - natural language processing KW - NLP KW - MedQA KW - generative pre-trained transformer KW - GPT KW - medical education KW - chatbot KW - artificial intelligence KW - AI KW - education technology KW - ChatGPT KW - Google Bard KW - conversational agent KW - machine learning KW - large language models KW - knowledge assessment UR - https://mededu.jmir.org/2023/1/e48305 UR - http://dx.doi.org/10.2196/48305 UR - http://www.ncbi.nlm.nih.gov/pubmed/37440293 ID - info:doi/10.2196/48305 ER - TY - JOUR AU - Nov, Oded AU - Singh, Nina AU - Mann, Devin PY - 2023/7/10 TI - Putting ChatGPT?s Medical Advice to the (Turing) Test: Survey Study JO - JMIR Med Educ SP - e46939 VL - 9 KW - artificial intelligence KW - AI KW - ChatGPT KW - large language model KW - patient-provider interaction KW - chatbot KW - feasibility KW - ethics KW - privacy KW - language model KW - machine learning N2 - Background: Chatbots are being piloted to draft responses to patient questions, but patients? ability to distinguish between provider and chatbot responses and patients? trust in chatbots? functions are not well established. Objective: This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence?based chatbot for patient-provider communication. Methods: A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients? questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider?s response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked?and incentivized financially?to correctly identify the response source. Participants were also asked about their trust in chatbots? functions in patient-provider communication, using a Likert scale from 1-5. Results: A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients? trust in chatbots? functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased. Conclusions: ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care. UR - https://mededu.jmir.org/2023/1/e46939 UR - http://dx.doi.org/10.2196/46939 UR - http://www.ncbi.nlm.nih.gov/pubmed/37428540 ID - info:doi/10.2196/46939 ER - TY - JOUR AU - Potts, Courtney AU - Lindström, Frida AU - Bond, Raymond AU - Mulvenna, Maurice AU - Booth, Frederick AU - Ennis, Edel AU - Parding, Karolina AU - Kostenius, Catrine AU - Broderick, Thomas AU - Boyd, Kyle AU - Vartiainen, Anna-Kaisa AU - Nieminen, Heidi AU - Burns, Con AU - Bickerdike, Andrea AU - Kuosmanen, Lauri AU - Dhanapala, Indika AU - Vakaloudis, Alex AU - Cahill, Brian AU - MacInnes, Marion AU - Malcolm, Martin AU - O'Neill, Siobhan PY - 2023/7/6 TI - A Multilingual Digital Mental Health and Well-Being Chatbot (ChatPal): Pre-Post Multicenter Intervention Study JO - J Med Internet Res SP - e43051 VL - 25 KW - conversational user interfaces KW - digital interventions KW - Warwick-Edinburgh Mental Well-Being Scale KW - Satisfaction With Life Scale KW - World Health Organization-Five Well-Being Index Scale KW - mental health KW - apps KW - health care KW - mixed methods KW - conversation agent KW - mental well-being KW - digital health intervention N2 - Background: In recent years, advances in technology have led to an influx of mental health apps, in particular the development of mental health and well-being chatbots, which have already shown promise in terms of their efficacy, availability, and accessibility. The ChatPal chatbot was developed to promote positive mental well-being among citizens living in rural areas. ChatPal is a multilingual chatbot, available in English, Scottish Gaelic, Swedish, and Finnish, containing psychoeducational content and exercises such as mindfulness and breathing, mood logging, gratitude, and thought diaries. Objective: The primary objective of this study is to evaluate a multilingual mental health and well-being chatbot (ChatPal) to establish if it has an effect on mental well-being. Secondary objectives include investigating the characteristics of individuals that showed improvements in well-being along with those with worsening well-being and applying thematic analysis to user feedback. Methods: A pre-post intervention study was conducted where participants were recruited to use the intervention (ChatPal) for a 12-week period. Recruitment took place across 5 regions: Northern Ireland, Scotland, the Republic of Ireland, Sweden, and Finland. Outcome measures included the Short Warwick-Edinburgh Mental Well-Being Scale, the World Health Organization-Five Well-Being Index, and the Satisfaction with Life Scale, which were evaluated at baseline, midpoint, and end point. Written feedback was collected from participants and subjected to qualitative analysis to identify themes. Results: A total of 348 people were recruited to the study (n=254, 73% female; n=94, 27% male) aged between 18 and 73 (mean 30) years. The well-being scores of participants improved from baseline to midpoint and from baseline to end point; however, improvement in scores was not statistically significant on the Short Warwick-Edinburgh Mental Well-Being Scale (P=.42), the World Health Organization-Five Well-Being Index (P=.52), or the Satisfaction With Life Scale (P=.81). Individuals that had improved well-being scores (n=16) interacted more with the chatbot and were significantly younger compared to those whose well-being declined over the study (P=.03). Three themes were identified from user feedback, including ?positive experiences,? ?mixed or neutral experiences,? and ?negative experiences.? Positive experiences included enjoying exercises provided by the chatbot, while most of the mixed, neutral, or negative experiences mentioned liking the chatbot overall, but there were some barriers, such as technical or performance errors, that needed to be overcome. Conclusions: Marginal improvements in mental well-being were seen in those who used ChatPal, albeit nonsignificant. We propose that the chatbot could be used along with other service offerings to complement different digital or face-to-face services, although further research should be carried out to confirm the effectiveness of this approach. Nonetheless, this paper highlights the need for blended service offerings in mental health care. UR - https://www.jmir.org/2023/1/e43051 UR - http://dx.doi.org/10.2196/43051 UR - http://www.ncbi.nlm.nih.gov/pubmed/37410537 ID - info:doi/10.2196/43051 ER - TY - JOUR AU - Booth, Frederick AU - Potts, Courtney AU - Bond, Raymond AU - Mulvenna, Maurice AU - Kostenius, Catrine AU - Dhanapala, Indika AU - Vakaloudis, Alex AU - Cahill, Brian AU - Kuosmanen, Lauri AU - Ennis, Edel PY - 2023/7/6 TI - A Mental Health and Well-Being Chatbot: User Event Log Analysis JO - JMIR Mhealth Uhealth SP - e43052 VL - 11 KW - mental well-being KW - positive psychology KW - data analysis KW - health care KW - event log analysis KW - ecological momentary assessment KW - conversational user interface KW - user behavior KW - conversational agent KW - user interface KW - user data KW - digital health application KW - mobile health app KW - digital intervention N2 - Background: Conversational user interfaces, or chatbots, are becoming more popular in the realm of digital health and well-being. While many studies focus on measuring the cause or effect of a digital intervention on people?s health and well-being (outcomes), there is a need to understand how users really engage and use a digital intervention in the real world. Objective: In this study, we examine the user logs of a mental well-being chatbot called ChatPal, which is based on the concept of positive psychology. The aim of this research is to analyze the log data from the chatbot to provide insight into usage patterns, the different types of users using clustering, and associations between the usage of the app?s features. Methods: Log data from ChatPal was analyzed to explore usage. A number of user characteristics including user tenure, unique days, mood logs recorded, conversations accessed, and total number of interactions were used with k-means clustering to identify user archetypes. Association rule mining was used to explore links between conversations. Results: ChatPal log data revealed 579 individuals older than 18 years used the app with most users being female (n=387, 67%). User interactions peaked around breakfast, lunchtime, and early evening. Clustering revealed 3 groups including ?abandoning users? (n=473), ?sporadic users? (n=93), and ?frequent transient users? (n=13). Each cluster had distinct usage characteristics, and the features were significantly different (P<.001) across each group. While all conversations within the chatbot were accessed at least once by users, the ?treat yourself like a friend? conversation was the most popular, which was accessed by 29% (n=168) of users. However, only 11.7% (n=68) of users repeated this exercise more than once. Analysis of transitions between conversations revealed strong links between ?treat yourself like a friend,? ?soothing touch,? and ?thoughts diary? among others. Association rule mining confirmed these 3 conversations as having the strongest linkages and suggested other associations between the co-use of chatbot features. Conclusions: This study has provided insight into the types of people using the ChatPal chatbot, patterns of use, and associations between the usage of the app?s features, which can be used to further develop the app by considering the features most accessed by users. UR - https://mhealth.jmir.org/2023/1/e43052 UR - http://dx.doi.org/10.2196/43052 UR - http://www.ncbi.nlm.nih.gov/pubmed/37410539 ID - info:doi/10.2196/43052 ER - TY - JOUR AU - Walker, Louise Harriet AU - Ghani, Shahi AU - Kuemmerli, Christoph AU - Nebiker, Andreas Christian AU - Müller, Peter Beat AU - Raptis, Aristotle Dimitri AU - Staubli, Manuel Sebastian PY - 2023/6/30 TI - Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument JO - J Med Internet Res SP - e47479 VL - 25 KW - artificial intelligence KW - internet information KW - patient information KW - ChatGPT KW - EQIP tool KW - chatbot KW - chatbots KW - conversational agent KW - conversational agents KW - internal medicine KW - pancreas KW - liver KW - hepatic KW - biliary KW - gall KW - bile KW - gallstone KW - pancreatitis KW - pancreatic KW - medical information N2 - Background: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. Objective: We aimed to assess the reliability of medical information provided by ChatGPT. Methods: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. Results: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss ? was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. Conclusions: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information. UR - https://www.jmir.org/2023/1/e47479 UR - http://dx.doi.org/10.2196/47479 UR - http://www.ncbi.nlm.nih.gov/pubmed/37389908 ID - info:doi/10.2196/47479 ER - TY - JOUR AU - Chen, Siyu AU - Zhang, Qingpeng AU - Chan, Chee-kit AU - Yu, Fuk-yuen AU - Chidgey, Andrew AU - Fang, Yuan AU - Mo, H. Phoenix K. AU - Wang, Zixin PY - 2023/6/30 TI - Evaluating an Innovative HIV Self-Testing Service With Web-Based, Real-Time Counseling Provided by an Artificial Intelligence Chatbot (HIVST-Chatbot) in Increasing HIV Self-Testing Use Among Chinese Men Who Have Sex With Men: Protocol for a Noninferiority Randomized Controlled Trial JO - JMIR Res Protoc SP - e48447 VL - 12 KW - Chatbot KW - counseling KW - HIV self-testing KW - men who have sex with men KW - non-inferiority randomized controlled trial N2 - Background: Counseling support for HIV self-testing (HIVST) users is essential to ensure support and linkage to care among men who have sex with men (MSM). An HIVST service with web-based real-time instruction, pretest, and posttest counseling provided by trained administrators (HIVST-OIC) was developed by previous projects. Although the HIVST-OIC was highly effective in increasing HIVST uptake and the proportion of HIVST users receiving counseling along with testing, it required intensive resources to implement and sustain. The service capacity of HIVST-OIC cannot meet the increasing demands of HIVST. Objective: This randomized controlled trial primarily aims to establish whether HIVST-chatbot, an innovative HIVST service with web-based real-time instruction and counseling provided by a fully automated chatbot, would produce effects that are similar to HIVST-OIC in increasing HIVST uptake and the proportion of HIVST users receiving counseling alongside testing among MSM within a 6-month follow-up period. Methods: A parallel-group, noninferiority randomized controlled trial will be conducted with Chinese-speaking MSM aged ?18 years with access to live-chat applications. A total of 528 participants will be recruited through multiple sources, including outreach in gay venues, web-based advertisement, and peer referral. After completing the baseline telephone survey, participants will be randomized evenly into the intervention or control groups. Intervention group participants will watch a web-based video promoting HIVST-chatbot and receive a free HIVST kit. The chatbot will contact the participant to implement HIVST and provide standard-of-care, real-time pretest and posttest counseling and instructions on how to use the HIVST kit through WhatsApp. Control group participants will watch a web-based video promoting HIVST-OIC and receive a free HIVST kit in the same manner. Upon appointment, a trained testing administrator will implement HIVST and provide standard-of-care, real-time pretest and posttest counseling and instructions on how to use the HIVST kit through live-chat applications. All participants will complete a telephone follow-up survey 6 months after the baseline. The primary outcomes are HIVST uptake and the proportion of HIVST users receiving counseling support along with testing in the past 6 months, measured at month 6. Secondary outcomes include sexual risk behaviors and uptake of HIV testing other than HIVST during the follow-up period. Intention-to-treat analysis will be used. Results: Recruitment and enrollment of participants started in April 2023. Conclusions: This study will generate important research and policy implications regarding chatbot use in HIVST services. If HIVST-chatbot is proven noninferior to HIVST-OIC, it can be easily integrated into existing HIVST services in Hong Kong, given its relatively low resource requirements for implementation and maintenance. HIVST-chatbot can potentially overcome the barriers to using HIVST. Therefore, the coverage of HIV testing, the level of support, and the linkage to care for MSM HIVST users will be increased. Trial Registration: ClinicalTrial.gov NCT05796622; https://clinicaltrials.gov/ct2/show/NCT05796622 International Registered Report Identifier (IRRID): PRR1-10.2196/48447 UR - https://www.researchprotocols.org/2023/1/e48447 UR - http://dx.doi.org/10.2196/48447 UR - http://www.ncbi.nlm.nih.gov/pubmed/37389935 ID - info:doi/10.2196/48447 ER - TY - JOUR AU - Takagi, Soshi AU - Watari, Takashi AU - Erabi, Ayano AU - Sakaguchi, Kota PY - 2023/6/29 TI - Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study JO - JMIR Med Educ SP - e48002 VL - 9 KW - ChatGPT KW - Chat Generative Pre-trained Transformer KW - GPT-4 KW - Generative Pre-trained Transformer 4 KW - artificial intelligence KW - AI KW - medical education KW - Japanese Medical Licensing Examination KW - medical licensing KW - clinical support KW - learning model N2 - Background: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. Objective: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. Methods: This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. Results: The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. Conclusions: GPT-4 could become a valuable tool for medical education and clinical support in non?English-speaking regions, such as Japan. UR - https://mededu.jmir.org/2023/1/e48002 UR - http://dx.doi.org/10.2196/48002 UR - http://www.ncbi.nlm.nih.gov/pubmed/37384388 ID - info:doi/10.2196/48002 ER - TY - JOUR AU - Liu, Jialin AU - Wang, Changyu AU - Liu, Siru PY - 2023/6/28 TI - Utility of ChatGPT in Clinical Practice JO - J Med Internet Res SP - e48568 VL - 25 KW - ChatGPT KW - artificial intelligence KW - large language models KW - clinical practice KW - large language model KW - natural language processing KW - NLP KW - doctor-patient KW - patient-physician KW - communication KW - challenges KW - barriers KW - recommendations KW - guidance KW - guidelines KW - best practices KW - risks UR - https://www.jmir.org/2023/1/e48568 UR - http://dx.doi.org/10.2196/48568 UR - http://www.ncbi.nlm.nih.gov/pubmed/37379067 ID - info:doi/10.2196/48568 ER - TY - JOUR AU - Massa, Paula AU - de Souza Ferraz, Aurélia Dulce AU - Magno, Laio AU - Silva, Paula Ana AU - Greco, Marília AU - Dourado, Inês AU - Grangeiro, Alexandre PY - 2023/6/23 TI - A Transgender Chatbot (Amanda Selfie) to Create Pre-exposure Prophylaxis Demand Among Adolescents in Brazil: Assessment of Acceptability, Functionality, Usability, and Results JO - J Med Internet Res SP - e41881 VL - 25 KW - artificial intelligence KW - adolescent KW - HIV pre-exposure prophylaxis KW - transgender women KW - men who have sex with men KW - chatbot KW - PrEP KW - transgender KW - HIV KW - prevention KW - prophylaxis KW - acceptability N2 - Background: HIV incidence rates have increased in adolescent men who have sex with men (AMSM) and adolescent transgender women (ATGW). Thus, it is essential to promote access to HIV prevention, including pre-exposure prophylaxis (PrEP), among these groups. Moreover, using artificial intelligence and online social platforms to create demand and access to health care services are essential tools for adolescents and youth. Objective: This study aims to describe the participative process of developing a chatbot using artificial intelligence to create demand for PrEP use among AMSM and ATGW in Brazil. Furthermore, it analyzes the chatbot?s acceptability, functionality, and usability and its results on the demand creation for PrEP. Methods: The chatbot Amanda Selfie integrates the demand creation strategies based on social networks (DCSSNs) of the PrEP1519 study. She was conceived as a Black transgender woman and to function as a virtual peer educator. The development process occurred in 3 phases (conception, trial, and final version) and lasted 21 months. A mixed methodology was used for the evaluations. Qualitative approaches, such as in-depth adolescent interviews, were used to analyze acceptability and usability, while quantitative methods were used to analyze the functionality and result of the demand creation for PrEP based on interactions with Amanda and information from health care services about using PrEP. To evaluate Amanda?s result on the demand creation for PrEP, we analyzed sociodemographic profiles of adolescents who interacted at least once with her and developed a cascade model containing the number of people at various stages between the first interaction and initiation of PrEP (PrEP uptake). These indicators were compared with other DCSs developed in the PrEP1519 study using chi-square tests and residual analysis (P=.05). Results: Amanda Selfie was well accepted as a peer educator, clearly and objectively communicating on topics such as gender identity, sexual experiences, HIV, and PrEP. The chatbot proved appropriate for answering questions in an agile and confidential manner, using the language used by AMSM and ATGW and with a greater sense of security and less judgment. The interactions with Amanda Selfie combined with a health professional were well evaluated and improved the appointment scheduling. The chatbot interacted with most people (757/1239, 61.1%) reached by the DCSSNs. However, when compared with the other DCSSNs, Amanda was not efficient in identifying AMSM/ATGW (359/482, 74.5% vs 130/757, 17.2% of total interactions, respectively) and in PrEP uptake (90/359, 25.1% vs 19/130, 14.6%). The following profiles were associated (P<.001) with Amanda Selfie?s demand creation, when compared with other DCS: ATGW and adolescents with higher levels of schooling and White skin color. Conclusions: Using a chatbot to create PrEP demand among AMSM and ATGW was well accepted, especially for ATGW with higher levels of schooling. A complimentary dialog with a health professional increased PrEP uptake, although it remained lower than the results of the other DCSSNs. UR - https://www.jmir.org/2023/1/e41881 UR - http://dx.doi.org/10.2196/41881 UR - http://www.ncbi.nlm.nih.gov/pubmed/37351920 ID - info:doi/10.2196/41881 ER - TY - JOUR AU - Mesko, Bertalan PY - 2023/6/22 TI - The ChatGPT (Generative Artificial Intelligence) Revolution Has Made Artificial Intelligence Approachable for Medical Professionals JO - J Med Internet Res SP - e48392 VL - 25 KW - artificial intelligence KW - digital health KW - future KW - technology KW - ChatGPT KW - medical practice KW - large language model KW - language model KW - generative KW - conversational agent KW - conversation agents KW - chatbot KW - generated text KW - computer generated KW - medical education KW - continuing education KW - professional development KW - curriculum KW - curricula UR - https://www.jmir.org/2023/1/e48392 UR - http://dx.doi.org/10.2196/48392 UR - http://www.ncbi.nlm.nih.gov/pubmed/37347508 ID - info:doi/10.2196/48392 ER - TY - JOUR AU - van der Schyff, L. Emma AU - Ridout, Brad AU - Amon, L. Krestina AU - Forsyth, Rowena AU - Campbell, J. Andrew PY - 2023/6/19 TI - Providing Self-Led Mental Health Support Through an Artificial Intelligence?Powered Chat Bot (Leora) to Meet the Demand of Mental Health Care JO - J Med Internet Res SP - e46448 VL - 25 KW - mental health KW - chatbots KW - conversational agents KW - anxiety KW - depression KW - AI KW - support KW - web-based service KW - web-based KW - deployment KW - stigma KW - users KW - symptoms KW - mental health care KW - self-led UR - https://www.jmir.org/2023/1/e46448 UR - http://dx.doi.org/10.2196/46448 UR - http://www.ncbi.nlm.nih.gov/pubmed/37335608 ID - info:doi/10.2196/46448 ER - TY - JOUR AU - Matheson, L. Emily AU - Smith, G. Harriet AU - Amaral, S. Ana C. AU - Meireles, F. Juliana F. AU - Almeida, C. Mireille AU - Linardon, Jake AU - Fuller-Tyszkiewicz, Matthew AU - Diedrichs, C. Phillippa PY - 2023/6/19 TI - Using Chatbot Technology to Improve Brazilian Adolescents? Body Image and Mental Health at Scale: Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e39934 VL - 11 KW - adolescent KW - Brazil KW - body image KW - chatbot KW - microintervention KW - randomized controlled trial KW - mobile phone N2 - Background: Accessible, cost-effective, and scalable mental health interventions are limited, particularly in low- and middle-income countries, where disparities between mental health needs and services are greatest. Microinterventions (ie, brief, stand-alone, or digital approaches) aim to provide immediate reprieve and enhancements in mental health states and offer a novel and scalable framework for embedding evidence-based mental health promotion techniques into digital environments. Body image is a global public health issue that increases young peoples? risk of developing more severe mental and physical health issues. Embedding body image microinterventions into digital environments is one avenue for providing young people with immediate and short-term reprieve and protection from the negative exposure effects associated with social media. Objective: This 2-armed, fully remote, and preregistered randomized controlled trial assessed the impact of a body image chatbot containing microinterventions on Brazilian adolescents? state and trait body image and associated well-being outcomes. Methods: Geographically diverse Brazilian adolescents aged 13-18 years (901/1715, 52.54% girls) were randomized into the chatbot or an assessment-only control condition and completed web-based self-assessments at baseline, immediately after the intervention time frame, and at 1-week and 1-month follow-ups. The primary outcomes were mean change in state (at chatbot entry and at the completion of a microintervention technique) and trait body image (before and after the intervention), with the secondary outcomes being mean change in affect (state and trait) and body image self-efficacy between the assessment time points. Results: Most participants who entered the chatbot (258/327, 78.9%) completed ?1 microintervention technique, with participants completing an average of 5 techniques over the 72-hour intervention period. Chatbot users experienced small significant improvements in primary (state: P<.001, Cohen d=0.30, 95% CI 0.25-0.34; and trait body image: P=.02, Cohen d range=0.10, 95% CI 0.01-0.18, to 0.26, 95% CI 0.13-0.32) and secondary outcomes across various time points (state: P<.001, Cohen d=0.28, 95% CI 0.22-0.33; trait positive affect: P=.02, Cohen d range=0.15, 95% CI 0.03-0.27, to 0.23, 95% CI 0.08-0.37; negative affect: P=.03, Cohen d range=?0.16, 95% CI ?0.30 to ?0.02, to ?0.18, 95% CI ?0.33 to ?0.03; and self-efficacy: P=.02, Cohen d range=0.14, 95% CI 0.03-0.25, to 0.19, 95% CI 0.08-0.32) relative to the control condition. Intervention benefits were moderated by baseline levels of concerns but not by gender. Conclusions: This is the first large-scale randomized controlled trial assessing a body image chatbot among Brazilian adolescents. Intervention attrition was high (531/858, 61.9%) and reflected the broader digital intervention literature; barriers to engagement were discussed. Meanwhile, the findings support the emerging literature that indicates microinterventions and chatbot technology are acceptable and effective web-based service provisions. This study also offers a blueprint for accessible, cost-effective, and scalable digital approaches that address disparities between health care needs and provisions in low- and middle-income countries. Trial Registration: Clinicaltrials.gov NCT04825184; http://clinicaltrials.gov/ct2/show/NCT04825184 International Registered Report Identifier (IRRID): RR2-10.1186/s12889-021-12129-1 UR - https://mhealth.jmir.org/2023/1/e39934 UR - http://dx.doi.org/10.2196/39934 UR - http://www.ncbi.nlm.nih.gov/pubmed/37335604 ID - info:doi/10.2196/39934 ER - TY - JOUR AU - Choudhury, Avishek AU - Shamszare, Hamid PY - 2023/6/14 TI - Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis JO - J Med Internet Res SP - e47184 VL - 25 KW - ChatGPT KW - trust in AI KW - artificial intelligence KW - technology adoption KW - behavioral intention KW - chatbot KW - human factors KW - trust KW - adoption KW - intent KW - survey KW - shared accountability KW - AI policy N2 - Background: ChatGPT (Chat Generative Pre-trained Transformer) has gained popularity for its ability to generate human-like responses. It is essential to note that overreliance or blind trust in ChatGPT, especially in high-stakes decision-making contexts, can have severe consequences. Similarly, lacking trust in the technology can lead to underuse, resulting in missed opportunities. Objective: This study investigated the impact of users? trust in ChatGPT on their intent and actual use of the technology. Four hypotheses were tested: (1) users? intent to use ChatGPT increases with their trust in the technology; (2) the actual use of ChatGPT increases with users? intent to use the technology; (3) the actual use of ChatGPT increases with users? trust in the technology; and (4) users? intent to use ChatGPT can partially mediate the effect of trust in the technology on its actual use. Methods: This study distributed a web-based survey to adults in the United States who actively use ChatGPT (version 3.5) at least once a month between February 2023 through March 2023. The survey responses were used to develop 2 latent constructs: Trust and Intent to Use, with Actual Use being the outcome variable. The study used partial least squares structural equation modeling to evaluate and test the structural model and hypotheses. Results: In the study, 607 respondents completed the survey. The primary uses of ChatGPT were for information gathering (n=219, 36.1%), entertainment (n=203, 33.4%), and problem-solving (n=135, 22.2%), with a smaller number using it for health-related queries (n=44, 7.2%) and other activities (n=6, 1%). Our model explained 50.5% and 9.8% of the variance in Intent to Use and Actual Use, respectively, with path coefficients of 0.711 and 0.221 for Trust on Intent to Use and Actual Use, respectively. The bootstrapped results failed to reject all 4 null hypotheses, with Trust having a significant direct effect on both Intent to Use (?=0.711, 95% CI 0.656-0.764) and Actual Use (?=0.302, 95% CI 0.229-0.374). The indirect effect of Trust on Actual Use, partially mediated by Intent to Use, was also significant (?=0.113, 95% CI 0.001-0.227). Conclusions: Our results suggest that trust is critical to users? adoption of ChatGPT. It remains crucial to highlight that ChatGPT was not initially designed for health care applications. Therefore, an overreliance on it for health-related advice could potentially lead to misinformation and subsequent health risks. Efforts must be focused on improving the ChatGPT?s ability to distinguish between queries that it can safely handle and those that should be redirected to human experts (health care professionals). Although risks are associated with excessive trust in artificial intelligence?driven chatbots such as ChatGPT, the potential risks can be reduced by advocating for shared accountability and fostering collaboration between developers, subject matter experts, and human factors researchers. UR - https://www.jmir.org/2023/1/e47184 UR - http://dx.doi.org/10.2196/47184 UR - http://www.ncbi.nlm.nih.gov/pubmed/37314848 ID - info:doi/10.2196/47184 ER - TY - JOUR AU - Phiri, Millie AU - Munoriyarwa, Allen PY - 2023/6/14 TI - Health Chatbots in Africa: Scoping Review JO - J Med Internet Res SP - e35573 VL - 25 KW - chatbots KW - health KW - Africa KW - technology KW - artificial intelligence KW - chatbot KW - health promotion KW - health database KW - World Health Organization KW - WHO KW - rural area KW - epidemiology KW - vulnerable population KW - health sector KW - Cochrane database N2 - Background: This scoping review explores and summarizes the existing literature on the use of chatbots to support and promote health in Africa. Objective: The primary aim was to learn where, and under what circumstances, chatbots have been used effectively for health in Africa; how chatbots have been developed to the best effect; and how they have been evaluated by looking at literature published between 2017 and 2022. A secondary aim was to identify potential lessons and best practices for others chatbots. The review also aimed to highlight directions for future research on the use of chatbots for health in Africa. Methods: Using the 2005 Arksey and O?Malley framework, we used a Boolean search to broadly search literature published between January 2017 and July 2022. Literature between June 2021 and July 2022 was identified using Google Scholar, EBSCO information services?which includes the African HealthLine, PubMed, MEDLINE, PsycInfo, Cochrane, Embase, Scopus, and Web of Science databases?and other internet sources (including gray literature). The inclusion criteria were literature about health chatbots in Africa published in journals, conference papers, opinion, or white papers. Results: In all, 212 records were screened, and 12 articles met the inclusion criteria. Results were analyzed according to the themes they covered. The themes identified included the purpose of the chatbot as either providing an educational or information-sharing service or providing a counselling service. Accessibility as a result of either technical restrictions or language restrictions was also noted. Other themes that were identified included the need for the consideration of trust, privacy and ethics, and evaluation. Conclusions: The findings demonstrate that current data are insufficient to show whether chatbots are effectively supporting health in the region. However, the review does reveal insights into popular chatbots and the need to make them accessible through language considerations, platform choice, and user trust, as well as the importance of robust evaluation frameworks to assess their impact. The review also provides recommendations on the direction of future research. UR - https://www.jmir.org/2023/1/e35573 UR - http://dx.doi.org/10.2196/35573 UR - http://www.ncbi.nlm.nih.gov/pubmed/35584083 ID - info:doi/10.2196/35573 ER - TY - JOUR AU - Jackson-Triche, Maga AU - Vetal, Don AU - Turner, Eva-Marie AU - Dahiya, Priya AU - Mangurian, Christina PY - 2023/6/8 TI - Meeting the Behavioral Health Needs of Health Care Workers During COVID-19 by Leveraging Chatbot Technology: Development and Usability Study JO - J Med Internet Res SP - e40635 VL - 25 KW - chatbot technology KW - health care workers KW - mental health equity KW - COVID-19 KW - mental health chatbot KW - behavioral health treatment KW - mental health screening KW - telehealth KW - psychoeducation KW - employee support N2 - Background: During the COVID-19 pandemic, health care systems were faced with the urgent need to implement strategies to address the behavioral health needs of health care workers. A primary concern of any large health care system is developing an easy-to-access, streamlined system of triage and support despite limited behavioral health resources. Objective: This study provides a detailed description of the design and implementation of a chatbot program designed to triage and facilitate access to behavioral health assessment and treatment for the workforce of a large academic medical center. The University of California, San Francisco (UCSF) Faculty, Staff, and Trainee Coping and Resiliency Program (UCSF Cope) aimed to provide timely access to a live telehealth navigator for triage and live telehealth assessment and treatment, curated web-based self-management tools, and nontreatment support groups for those experiencing stress related to their unique roles. Methods: In a public-private partnership, the UCSF Cope team built a chatbot to triage employees based on behavioral health needs. The chatbot is an algorithm-based, automated, and interactive artificial intelligence conversational tool that uses natural language understanding to engage users by presenting a series of questions with simple multiple-choice answers. The goal of each chatbot session was to guide users to services that were appropriate for their needs. Designers developed a chatbot data dashboard to identify and follow trends directly through the chatbot. Regarding other program elements, website user data were collected monthly and participant satisfaction was gathered for each nontreatment support group. Results: The UCSF Cope chatbot was rapidly developed and launched on April 20, 2020. As of May 31, 2022, a total of 10.88% (3785/34,790) of employees accessed the technology. Among those reporting any form of psychological distress, 39.7% (708/1783) of employees requested in-person services, including those who had an existing provider. UCSF employees responded positively to all program elements. As of May 31, 2022, the UCSF Cope website had 615,334 unique users, with 66,585 unique views of webinars and 601,471 unique views of video shorts. All units across UCSF were reached by UCSF Cope staff for special interventions, with >40 units requesting these services. Town halls were particularly well received, with >80% of attendees reporting the experience as helpful. Conclusions: UCSF Cope used chatbot technology to incorporate individualized behavioral health triage, assessment, treatment, and general emotional support for an entire employee base (N=34,790). This level of triage for a population of this size would not have been possible without the use of chatbot technology. The UCSF Cope model has the potential to be scaled, adapted, and implemented across both academically and nonacademically affiliated medical settings. UR - https://www.jmir.org/2023/1/e40635 UR - http://dx.doi.org/10.2196/40635 UR - http://www.ncbi.nlm.nih.gov/pubmed/37146178 ID - info:doi/10.2196/40635 ER - TY - JOUR AU - Islam, Ashraful AU - Chaudhry, Moalla Beenish PY - 2023/6/8 TI - Design Validation of a Relational Agent by COVID-19 Patients: Mixed Methods Study JO - JMIR Hum Factors SP - e42740 VL - 10 KW - COVID-19 KW - relational agent KW - mHealth KW - design validation KW - health care KW - chatbot KW - digital health intervention KW - health care professional KW - heuristic KW - health promotion KW - mental well-being KW - design validation survey KW - self-isolation N2 - Background: Relational agents (RAs) have shown effectiveness in various health interventions with and without doctors and hospital facilities. In situations such as a pandemic like the COVID-19 pandemic when health care professionals (HCPs) and facilities are unable to cope with increased demands, RAs may play a major role in ameliorating the situation. However, they have not been well explored in this domain. Objective: This study aimed to design a prototypical RA in collaboration with COVID-19 patients and HCPs and test it with the potential users, for its ability to deliver services during a pandemic. Methods: The RA was designed and developed in collaboration with people with COVID-19 (n=21) and 2 groups of HCPs (n=19 and n=16, respectively) to aid COVID-19 patients at various stages by performing 4 main tasks: testing guidance, support during self-isolation, handling emergency situations, and promoting postrecovery mental well-being. A design validation survey was conducted with 98 individuals to evaluate the usability of the prototype using the System Usability Scale (SUS), and the participants provided feedback on the design. In addition, the RA?s usefulness and acceptability were rated by the participants using Likert scales. Results: In the design validation survey, the prototypical RA received an average SUS score of 58.82. Moreover, 90% (88/98) of participants perceived it to be helpful, and 69% (68/98) of participants accepted it as a viable alternative to HCPs. The prototypical RA received favorable feedback from the participants, and they were inclined to accept it as an alternative to HCPs in non-life-threatening scenarios despite the usability rating falling below the acceptable threshold. Conclusions: Based on participants? feedback, we recommend further development of the RA with improved automation and emotional support, ability to provide information, tracking, and specific recommendations. UR - https://humanfactors.jmir.org/2023/1/e42740 UR - http://dx.doi.org/10.2196/42740 UR - http://www.ncbi.nlm.nih.gov/pubmed/36350760 ID - info:doi/10.2196/42740 ER - TY - JOUR AU - Karabacak, Mert AU - Ozkara, Berksu Burak AU - Margetis, Konstantinos AU - Wintermark, Max AU - Bisdas, Sotirios PY - 2023/6/6 TI - The Advent of Generative Language Models in Medical Education JO - JMIR Med Educ SP - e48163 VL - 9 KW - generative language model KW - artificial intelligence KW - medical education KW - ChatGPT KW - academic integrity KW - AI-driven feedback KW - stimulation KW - evaluation KW - technology KW - learning environment KW - medical student UR - https://mededu.jmir.org/2023/1/e48163 UR - http://dx.doi.org/10.2196/48163 UR - http://www.ncbi.nlm.nih.gov/pubmed/37279048 ID - info:doi/10.2196/48163 ER - TY - JOUR AU - Perez, Analay AU - Fetters, D. Michael AU - Creswell, W. John AU - Scerbo, Mark AU - Kron, W. Frederick AU - Gonzalez, Richard AU - An, Lawrence AU - Jimbo, Masahito AU - Klasnja, Predrag AU - Guetterman, C. Timothy PY - 2023/6/6 TI - Enhancing Nonverbal Communication Through Virtual Human Technology: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e46601 VL - 12 KW - human technology KW - MPathic-VR KW - nonverbal communication behavior KW - patient-provider communication KW - virtual human N2 - Background: Communication is a critical component of the patient-provider relationship; however, limited research exists on the role of nonverbal communication. Virtual human training is an informatics-based educational strategy that offers various benefits in communication skill training directed at providers. Recent informatics-based interventions aimed at improving communication have mainly focused on verbal communication, yet research is needed to better understand how virtual humans can improve verbal and nonverbal communication and further elucidate the patient-provider dyad. Objective: The purpose of this study is to enhance a conceptual model that incorporates technology to examine verbal and nonverbal components of communication and develop a nonverbal assessment that will be included in the virtual simulation for further testing. Methods: This study will consist of a multistage mixed methods design, including convergent and exploratory sequential components. A convergent mixed methods study will be conducted to examine the mediating effects of nonverbal communication. Quantitative (eg, MPathic game scores, Kinect nonverbal data, objective structured clinical examination communication score, and Roter Interaction Analysis System and Facial Action Coding System coding of video) and qualitative data (eg, video recordings of MPathic?virtual reality [VR] interventions and student reflections) will be collected simultaneously. Data will be merged to determine the most crucial components of nonverbal behavior in human-computer interaction. An exploratory sequential design will proceed, consisting of a grounded theory qualitative phase. Using theoretical, purposeful sampling, interviews will be conducted with oncology providers probing intentional nonverbal behaviors. The qualitative findings will aid the development of a nonverbal communication model that will be included in a virtual human. The subsequent quantitative strand will incorporate and validate a new automated nonverbal communication behavior assessment into the virtual human simulation, MPathic-VR, by assessing interrater reliability, code interactions, and dyadic data analysis by comparing Kinect responses (system recorded) to manually scored records for specific nonverbal behaviors. Data will be integrated using building integration to develop the automated nonverbal communication behavior assessment and conduct a quality check of these nonverbal features. Results: Secondary data from the MPathic-VR randomized controlled trial data set (210 medical students and 840 video recordings of interactions) were analyzed in the first part of this study. Results showed differential experiences by performance in the intervention group. Following the analysis of the convergent design, participants consisting of medical providers (n=30) will be recruited for the qualitative phase of the subsequent exploratory sequential design. We plan to complete data collection by July 2023 to analyze and integrate these findings. Conclusions: The results from this study contribute to the improvement of patient-provider communication, both verbal and nonverbal, including the dissemination of health information and health outcomes for patients. Further, this research aims to transfer to various topical areas, including medication safety, informed consent processes, patient instructions, and treatment adherence between patients and providers. International Registered Report Identifier (IRRID): DERR1-10.2196/46601 UR - https://www.researchprotocols.org/2023/1/e46601 UR - http://dx.doi.org/10.2196/46601 UR - http://www.ncbi.nlm.nih.gov/pubmed/37279041 ID - info:doi/10.2196/46601 ER - TY - JOUR AU - Nehme, Mayssam AU - Schneider, Franck AU - Perrin, Anne AU - Sum Yu, Wing AU - Schmitt, Simon AU - Violot, Guillemette AU - Ducrot, Aurelie AU - Tissandier, Frederique AU - Posfay-Barbe, Klara AU - Guessous, Idris PY - 2023/6/5 TI - The Development of a Chatbot Technology to Disseminate Post?COVID-19 Information: Descriptive Implementation Study JO - J Med Internet Res SP - e43113 VL - 25 KW - COVID-19 KW - post?COVID-19 KW - long COVID KW - PASC KW - postacute sequelae of SARS-CoV-2 KW - chatbot KW - medical technology KW - online platform KW - information KW - communication KW - dissemination KW - disease management KW - conversational agent KW - digital surveillance KW - pediatric KW - children KW - caregiver N2 - Background: Post?COVID-19, or long COVID, has now affected millions of individuals, resulting in fatigue, neurocognitive symptoms, and an impact on daily life. The uncertainty of knowledge around this condition, including its overall prevalence, pathophysiology, and management, along with the growing numbers of affected individuals, has created an essential need for information and disease management. This has become even more critical in a time of abundant online misinformation and potential misleading of patients and health care professionals. Objective: The RAFAEL platform is an ecosystem created to address the information about and management of post?COVID-19, integrating online information, webinars, and chatbot technology to answer a large number of individuals in a time- and resource-limited setting. This paper describes the development and deployment of the RAFAEL platform and chatbot in addressing post?COVID-19 in children and adults. Methods: The RAFAEL study took place in Geneva, Switzerland. The RAFAEL platform and chatbot were made available online, and all users were considered participants of this study. The development phase started in December 2020 and included developing the concept, the backend, and the frontend, as well as beta testing. The specific strategy behind the RAFAEL chatbot balanced an accessible interactive approach with medical safety, aiming to relay correct and verified information for the management of post?COVID-19. Development was followed by deployment with the establishment of partnerships and communication strategies in the French-speaking world. The use of the chatbot and the answers provided were continuously monitored by community moderators and health care professionals, creating a safe fallback for users. Results: To date, the RAFAEL chatbot has had 30,488 interactions, with an 79.6% (6417/8061) matching rate and a 73.2% (n=1795) positive feedback rate out of the 2451 users who provided feedback. Overall, 5807 unique users interacted with the chatbot, with 5.1 interactions per user, on average, and 8061 stories triggered. The use of the RAFAEL chatbot and platform was additionally driven by the monthly thematic webinars as well as communication campaigns, with an average of 250 participants at each webinar. User queries included questions about post?COVID-19 symptoms (n=5612, 69.2%), of which fatigue was the most predominant query (n=1255, 22.4%) in symptoms-related stories. Additional queries included questions about consultations (n=598, 7.4%), treatment (n=527, 6.5%), and general information (n=510, 6.3%). Conclusions: The RAFAEL chatbot is, to the best of our knowledge, the first chatbot developed to address post?COVID-19 in children and adults. Its innovation lies in the use of a scalable tool to disseminate verified information in a time- and resource-limited environment. Additionally, the use of machine learning could help professionals gain knowledge about a new condition, while concomitantly addressing patients? concerns. Lessons learned from the RAFAEL chatbot will further encourage a participative approach to learning and could potentially be applied to other chronic conditions. UR - https://www.jmir.org/2023/1/e43113 UR - http://dx.doi.org/10.2196/43113 UR - http://www.ncbi.nlm.nih.gov/pubmed/37195688 ID - info:doi/10.2196/43113 ER - TY - JOUR AU - Abd-alrazaq, Alaa AU - AlSaad, Rawan AU - Alhuwail, Dari AU - Ahmed, Arfan AU - Healy, Mark Padraig AU - Latifi, Syed AU - Aziz, Sarah AU - Damseh, Rafat AU - Alabed Alrazak, Sadam AU - Sheikh, Javaid PY - 2023/6/1 TI - Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions JO - JMIR Med Educ SP - e48291 VL - 9 KW - large language models KW - artificial intelligence KW - medical education KW - ChatGPT KW - GPT-4 KW - generative AI KW - students KW - educators UR - https://mededu.jmir.org/2023/1/e48291 UR - http://dx.doi.org/10.2196/48291 UR - http://www.ncbi.nlm.nih.gov/pubmed/37261894 ID - info:doi/10.2196/48291 ER - TY - JOUR AU - Ballester, L. Pedro PY - 2023/5/31 TI - Open Science and Software Assistance: Commentary on ?Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora?s Box Has Been Opened? JO - J Med Internet Res SP - e49323 VL - 25 KW - artificial intelligence KW - AI KW - ChatGPT KW - open science KW - reproducibility KW - software assistance UR - https://www.jmir.org/2023/1/e49323 UR - http://dx.doi.org/10.2196/49323 UR - http://www.ncbi.nlm.nih.gov/pubmed/37256656 ID - info:doi/10.2196/49323 ER - TY - JOUR AU - Májovský, Martin AU - ?erný, Martin AU - Kasal, Mat?j AU - Komarc, Martin AU - Netuka, David PY - 2023/5/31 TI - Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora?s Box Has Been Opened JO - J Med Internet Res SP - e46924 VL - 25 KW - artificial intelligence KW - publications KW - ethics KW - neurosurgery KW - ChatGPT KW - language models KW - fraudulent medical articles N2 - Background: Artificial intelligence (AI) has advanced substantially in recent years, transforming many industries and improving the way people live and work. In scientific research, AI can enhance the quality and efficiency of data analysis and publication. However, AI has also opened up the possibility of generating high-quality fraudulent papers that are difficult to detect, raising important questions about the integrity of scientific research and the trustworthiness of published papers. Objective: The aim of this study was to investigate the capabilities of current AI language models in generating high-quality fraudulent medical articles. We hypothesized that modern AI models can create highly convincing fraudulent papers that can easily deceive readers and even experienced researchers. Methods: This proof-of-concept study used ChatGPT (Chat Generative Pre-trained Transformer) powered by the GPT-3 (Generative Pre-trained Transformer 3) language model to generate a fraudulent scientific article related to neurosurgery. GPT-3 is a large language model developed by OpenAI that uses deep learning algorithms to generate human-like text in response to prompts given by users. The model was trained on a massive corpus of text from the internet and is capable of generating high-quality text in a variety of languages and on various topics. The authors posed questions and prompts to the model and refined them iteratively as the model generated the responses. The goal was to create a completely fabricated article including the abstract, introduction, material and methods, discussion, references, charts, etc. Once the article was generated, it was reviewed for accuracy and coherence by experts in the fields of neurosurgery, psychiatry, and statistics and compared to existing similar articles. Results: The study found that the AI language model can create a highly convincing fraudulent article that resembled a genuine scientific paper in terms of word usage, sentence structure, and overall composition. The AI-generated article included standard sections such as introduction, material and methods, results, and discussion, as well a data sheet. It consisted of 1992 words and 17 citations, and the whole process of article creation took approximately 1 hour without any special training of the human user. However, there were some concerns and specific mistakes identified in the generated article, specifically in the references. Conclusions: The study demonstrates the potential of current AI language models to generate completely fabricated scientific articles. Although the papers look sophisticated and seemingly flawless, expert readers may identify semantic inaccuracies and errors upon closer inspection. We highlight the need for increased vigilance and better detection methods to combat the potential misuse of AI in scientific research. At the same time, it is important to recognize the potential benefits of using AI language models in genuine scientific writing and research, such as manuscript preparation and language editing. UR - https://www.jmir.org/2023/1/e46924 UR - http://dx.doi.org/10.2196/46924 UR - http://www.ncbi.nlm.nih.gov/pubmed/37256685 ID - info:doi/10.2196/46924 ER - TY - JOUR AU - Braddock, Traylor William Richard AU - Ocasio, A. Manuel AU - Comulada, Scott W. AU - Mandani, Jan AU - Fernandez, Isabel M. PY - 2023/5/31 TI - Increasing Participation in a TelePrEP Program for Sexual and Gender Minority Adolescents and Young Adults in Louisiana: Protocol for an SMS Text Messaging?Based Chatbot JO - JMIR Res Protoc SP - e42983 VL - 12 KW - chatbot KW - conversational agent KW - develop KW - iterative KW - messaging KW - text message KW - HIV KW - PrEP KW - pre-exposure prophylaxis KW - user testing KW - rule-based KW - prevention KW - eHealth KW - telehealth KW - mobile phone KW - sexual minority youth KW - gender minority youth KW - young adult KW - youth KW - adolescent KW - sexual minority KW - gender minority KW - gender diverse KW - gender diversity KW - SMS KW - artificial intelligence KW - patient education KW - health information KW - web-based information KW - user experience N2 - Background: Sexual and gender minority (SGM) adolescents and young adults (AYAs) are at increased risk of HIV infection, particularly in the Southern United States. Despite the availability of effective biomedical prevention strategies, such as pre-exposure prophylaxis (PrEP), access and uptake remain low among SGM AYAs. In response, the Louisiana Department of Health initiated the LA TelePrEP Program, which leverages the power of telemedicine to connect Louisiana residents to PrEP. A virtual TelePrEP Navigator guides users through the enrollment process, answers questions, schedules appointments, and facilitates lab testing and medication delivery. To increase the participation of SGM AYAs in the program, the TelePrEP program partnered with researchers to develop a chatbot that would facilitate access to the program and support navigator functions. Chatbots are capable of carrying out many functions that reduce employee workload, and despite their successful use in health care and public health, they are relatively new to HIV prevention. Objective: In this paper, we describe the iterative and community-engaged process that we used to develop an SMS text messaging?based chatbot tailored to SGM AYAs that would support navigator functions and disseminate PrEP-related information. Methods: Our process was comprised of 2 phases: conceptualization and development. In the conceptualization phase, aspects of navigator responsibilities, program logistics, and user interactions to prioritize in chatbot programming (eg, scheduling appointments and answering questions) were identified. We also selected a commercially available chatbot platform that could execute these functions and could be programmed with minimal coding experience. In the development phase, we engaged Department of Health staff and SGM AYAs within our professional and personal networks. Five different rounds of testing were conducted with various groups to evaluate each iteration of the chatbot. After each iteration of the testing process, the research team met to discuss feedback, guide the programmer on incorporating modifications, and re-evaluate the chatbot?s functionality. Results: Through our highly collaborative and community-engaged process, a rule-based chatbot with artificial intelligence components was successfully created. We gained important knowledge that could advance future chatbot development efforts for HIV prevention. Key to the PrEPBot?s success was resolving issues that hampered the user experience, like asking unnecessary questions, responding too quickly, and misunderstanding user input. Conclusions: HIV prevention researchers can feasibly and efficiently program a rule-based chatbot with the assistance of commercially available tools. Our iterative process of engaging researchers, program personnel, and different subgroups of SGM AYAs to obtain input was key to successful chatbot development. If the results of this pilot trial show that the chatbot is feasible and acceptable to SGM AYAs, future HIV researchers and practitioners could consider incorporating chatbots as part of their programs. International Registered Report Identifier (IRRID): PRR1-10.2196/42983 UR - https://www.researchprotocols.org/2023/1/e42983 UR - http://dx.doi.org/10.2196/42983 UR - http://www.ncbi.nlm.nih.gov/pubmed/37256669 ID - info:doi/10.2196/42983 ER - TY - JOUR AU - Han, Jeong Hee AU - Mendu, Sanjana AU - Jaworski, K. Beth AU - Owen, E. Jason AU - Abdullah, Saeed PY - 2023/5/29 TI - Preliminary Evaluation of a Conversational Agent to Support Self-management of Individuals Living With Posttraumatic Stress Disorder: Interview Study With Clinical Experts JO - JMIR Form Res SP - e45894 VL - 7 KW - conversational agent KW - PTSD KW - self-management KW - clinical experts KW - evaluation KW - support system KW - mental health KW - trauma N2 - Background: Posttraumatic stress disorder (PTSD) is a serious public health concern. However, individuals with PTSD often do not have access to adequate treatment. A conversational agent (CA) can help to bridge the treatment gap by providing interactive and timely interventions at scale. Toward this goal, we have developed PTSDialogue?a CA to support the self-management of individuals living with PTSD. PTSDialogue is designed to be highly interactive (eg, brief questions, ability to specify preferences, and quick turn-taking) and supports social presence to promote user engagement and sustain adherence. It includes a range of support features, including psychoeducation, assessment tools, and several symptom management tools. Objective: This paper focuses on the preliminary evaluation of PTSDialogue from clinical experts. Given that PTSDialogue focuses on a vulnerable population, it is critical to establish its usability and acceptance with clinical experts before deployment. Expert feedback is also important to ensure user safety and effective risk management in CAs aiming to support individuals living with PTSD. Methods: We conducted remote, one-on-one, semistructured interviews with clinical experts (N=10) to gather insight into the use of CAs. All participants have completed their doctoral degrees and have prior experience in PTSD care. The web-based PTSDialogue prototype was then shared with the participant so that they could interact with different functionalities and features. We encouraged them to ?think aloud? as they interacted with the prototype. Participants also shared their screens throughout the interaction session. A semistructured interview script was also used to gather insights and feedback from the participants. The sample size is consistent with that of prior works. We analyzed interview data using a qualitative interpretivist approach resulting in a bottom-up thematic analysis. Results: Our data establish the feasibility and acceptance of PTSDialogue, a supportive tool for individuals with PTSD. Most participants agreed that PTSDialogue could be useful for supporting self-management of individuals with PTSD. We have also assessed how features, functionalities, and interactions in PTSDialogue can support different self-management needs and strategies for this population. These data were then used to identify design requirements and guidelines for a CA aiming to support individuals with PTSD. Experts specifically noted the importance of empathetic and tailored CA interactions for effective PTSD self-management. They also suggested steps to ensure safe and engaging interactions with PTSDialogue. Conclusions: Based on interviews with experts, we have provided design recommendations for future CAs aiming to support vulnerable populations. The study suggests that well-designed CAs have the potential to reshape effective intervention delivery and help address the treatment gap in mental health. UR - https://formative.jmir.org/2023/1/e45894 UR - http://dx.doi.org/10.2196/45894 UR - http://www.ncbi.nlm.nih.gov/pubmed/37247220 ID - info:doi/10.2196/45894 ER - TY - JOUR AU - Noh, Eunyoung AU - Won, Jiyoon AU - Jo, Sua AU - Hahm, Dae-Hyun AU - Lee, Hyangsook PY - 2023/5/26 TI - Conversational Agents for Body Weight Management: Systematic Review JO - J Med Internet Res SP - e42238 VL - 25 KW - conversational agent KW - chatbot KW - obesity KW - weight management KW - artificial intelligence KW - behavioral therapy N2 - Background: Obesity is a public health issue worldwide. Conversational agents (CAs), also frequently called chatbots, are computer programs that simulate dialogue between people. Owing to better accessibility, cost-effectiveness, personalization, and compassionate patient-centered treatments, CAs are expected to have the potential to provide sustainable lifestyle counseling for weight management. Objective: This systematic review aimed to critically summarize and evaluate clinical studies on the effectiveness and feasibility of CAs with unconstrained natural language input for weight management. Methods: PubMed, Embase, the Cochrane Library (CENTRAL), PsycINFO, and ACM Digital Library were searched up to December 2022. Studies were included if CAs were used for weight management and had a capability for unconstrained natural language input. No restrictions were imposed on study design, language, or publication type. The quality of the included studies was assessed using the Cochrane risk-of-bias assessment tool or the Critical Appraisal Skills Programme checklist. The extracted data from the included studies were tabulated and narratively summarized as substantial heterogeneity was expected. Results: In total, 8 studies met the eligibility criteria: 3 (38%) randomized controlled trials and 5 (62%) uncontrolled before-and-after studies. The CAs in the included studies were aimed at behavior changes through education, advice on food choices, or counseling via psychological approaches. Of the included studies, only 38% (3/8) reported a substantial weight loss outcome (1.3-2.4 kg decrease at 12-15 weeks of CA use). The overall quality of the included studies was judged as low. Conclusions: The findings of this systematic review suggest that CAs with unconstrained natural language input can be used as a feasible interpersonal weight management intervention by promoting engagement in psychiatric intervention-based conversations simulating treatments by health care professionals, but currently there is a paucity of evidence. Well-designed rigorous randomized controlled trials with larger sample sizes, longer treatment duration, and follow-up focusing on CAs? acceptability, efficacy, and safety are warranted. UR - https://www.jmir.org/2023/1/e42238 UR - http://dx.doi.org/10.2196/42238 UR - http://www.ncbi.nlm.nih.gov/pubmed/37234029 ID - info:doi/10.2196/42238 ER - TY - JOUR AU - Lyzwinski, Nathalie Lynnette AU - Elgendi, Mohamed AU - Menon, Carlo PY - 2023/5/25 TI - Conversational Agents and Avatars for Cardiometabolic Risk Factors and Lifestyle-Related Behaviors: Scoping Review JO - JMIR Mhealth Uhealth SP - e39649 VL - 11 KW - chatbots KW - avatars KW - conversational coach KW - diet KW - physical activity KW - cardiovascular disease KW - hypertension KW - cardiometabolic KW - behavior change KW - hypertension diabetes KW - metabolic syndrome KW - mobile phone N2 - Background: In recent years, there has been a rise in the use of conversational agents for lifestyle medicine, in particular for weight-related behaviors and cardiometabolic risk factors. Little is known about the effectiveness and acceptability of and engagement with conversational and virtual agents as well as the applicability of these agents for metabolic syndrome risk factors such as an unhealthy dietary intake, physical inactivity, diabetes, and hypertension. Objective: This review aimed to get a greater understanding of the virtual agents that have been developed for cardiometabolic risk factors and to review their effectiveness. Methods: A systematic review of PubMed and MEDLINE was conducted to review conversational agents for cardiometabolic risk factors, including chatbots and embodied avatars. Results: A total of 50 studies were identified. Overall, chatbots and avatars appear to have the potential to improve weight-related behaviors such as dietary intake and physical activity. There were limited studies on hypertension and diabetes. Patients seemed interested in using chatbots and avatars for modifying cardiometabolic risk factors, and adherence was acceptable across the studies, except for studies of virtual agents for diabetes. However, there is a need for randomized controlled trials to confirm this finding. As there were only a few clinical trials, more research is needed to confirm whether conversational coaches may assist with cardiovascular disease and diabetes, and physical activity. Conclusions: Conversational coaches may regulate cardiometabolic risk factors; however, quality trials are needed to expand the evidence base. A future chatbot could be tailored to metabolic syndrome specifically, targeting all the areas covered in the literature, which would be novel. UR - https://mhealth.jmir.org/2023/1/e39649 UR - http://dx.doi.org/10.2196/39649 UR - http://www.ncbi.nlm.nih.gov/pubmed/37227765 ID - info:doi/10.2196/39649 ER - TY - JOUR AU - Sadasivan, Chikku AU - Cruz, Christofer AU - Dolgoy, Naomi AU - Hyde, Ashley AU - Campbell, Sandra AU - McNeely, Margaret AU - Stroulia, Eleni AU - Tandon, Puneeta PY - 2023/5/22 TI - Examining Patient Engagement in Chatbot Development Approaches for Healthy Lifestyle and Mental Wellness Interventions: Scoping Review JO - J Particip Med SP - e45772 VL - 15 KW - chatbots KW - virtual assistants KW - patient involvement KW - patient engagement KW - codevelopment N2 - Background: Chatbots are growing in popularity as they offer a range of potential benefits to end users and service providers. Objective: Our scoping review aimed to explore studies that used 2-way chatbots to support healthy eating, physical activity, and mental wellness interventions. Our objectives were to report the nontechnical (eg, unrelated to software development) approaches for chatbot development and to examine the level of patient engagement in these reported approaches. Methods: Our team conducted a scoping review following the framework proposed by Arksey and O?Malley. Nine electronic databases were searched in July 2022. Studies were selected based on our inclusion and exclusion criteria. Data were then extracted and patient involvement was assessed. Results: 16 studies were included in this review. We report several approaches to chatbot development, assess patient involvement where possible, and reveal the limited detail available on reporting of patient involvement in the chatbot implementation process. The reported approaches for development included: collaboration with knowledge experts, co-design workshops, patient interviews, prototype testing, the Wizard of Oz (WoZ) procedure, and literature review. Reporting of patient involvement in development was limited; only 3 of the 16 included studies contained sufficient information to evaluate patient engagement using the Guidance for Reporting Involvement of Patients and Public (GRIPP2). Conclusions: The approaches reported in this review and the identified limitations can guide the inclusion of patient engagement and the improved documentation of engagement in the chatbot development process for future health care research. Given the importance of end user involvement in chatbot development, we hope that future research will more systematically report on chatbot development and more consistently and actively engage patients in the codevelopment process. UR - https://jopm.jmir.org/2023/1/e45772 UR - http://dx.doi.org/10.2196/45772 UR - http://www.ncbi.nlm.nih.gov/pubmed/37213199 ID - info:doi/10.2196/45772 ER - TY - JOUR AU - Shahsavar, Yeganeh AU - Choudhury, Avishek PY - 2023/5/17 TI - User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study JO - JMIR Hum Factors SP - e47564 VL - 10 KW - human factors KW - behavioral intention KW - chatbots KW - health care KW - integrated diagnostics KW - use KW - ChatGPT KW - artificial intelligence KW - users KW - self-diagnosis KW - decision-making KW - integration KW - willingness KW - policy N2 - Background: With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have emerged as potential tools for various applications, including health care. However, ChatGPT is not specifically designed for health care purposes, and its use for self-diagnosis raises concerns regarding its adoption?s potential risks and benefits. Users are increasingly inclined to use ChatGPT for self-diagnosis, necessitating a deeper understanding of the factors driving this trend. Objective: This study aims to investigate the factors influencing users? perception of decision-making processes and intentions to use ChatGPT for self-diagnosis and to explore the implications of these findings for the safe and effective integration of AI chatbots in health care. Methods: A cross-sectional survey design was used, and data were collected from 607 participants. The relationships between performance expectancy, risk-reward appraisal, decision-making, and intention to use ChatGPT for self-diagnosis were analyzed using partial least squares structural equation modeling (PLS-SEM). Results: Most respondents were willing to use ChatGPT for self-diagnosis (n=476, 78.4%). The model demonstrated satisfactory explanatory power, accounting for 52.4% of the variance in decision-making and 38.1% in the intent to use ChatGPT for self-diagnosis. The results supported all 3 hypotheses: The higher performance expectancy of ChatGPT (?=.547, 95% CI 0.474-0.620) and positive risk-reward appraisals (?=.245, 95% CI 0.161-0.325) were positively associated with the improved perception of decision-making outcomes among users, and enhanced perception of decision-making processes involving ChatGPT positively impacted users? intentions to use the technology for self-diagnosis (?=.565, 95% CI 0.498-0.628). Conclusions: Our research investigated factors influencing users? intentions to use ChatGPT for self-diagnosis and health-related purposes. Even though the technology is not specifically designed for health care, people are inclined to use ChatGPT in health care contexts. Instead of solely focusing on discouraging its use for health care purposes, we advocate for improving the technology and adapting it for suitable health care applications. Our study highlights the importance of collaboration among AI developers, health care providers, and policy makers in ensuring AI chatbots? safe and responsible use in health care. By understanding users? expectations and decision-making processes, we can develop AI chatbots, such as ChatGPT, that are tailored to human needs, providing reliable and verified health information sources. This approach not only enhances health care accessibility but also improves health literacy and awareness. As the field of AI chatbots in health care continues to evolve, future research should explore the long-term effects of using AI chatbots for self-diagnosis and investigate their potential integration with other digital health interventions to optimize patient care and outcomes. In doing so, we can ensure that AI chatbots, including ChatGPT, are designed and implemented to safeguard users? well-being and support positive health outcomes in health care settings. UR - https://humanfactors.jmir.org/2023/1/e47564 UR - http://dx.doi.org/10.2196/47564 UR - http://www.ncbi.nlm.nih.gov/pubmed/37195756 ID - info:doi/10.2196/47564 ER - TY - JOUR AU - Su, Ting AU - Calvo, A. Rafael AU - Jouaiti, Melanie AU - Daniels, Sarah AU - Kirby, Pippa AU - Dijk, Derk-Jan AU - della Monica, Ciro AU - Vaidyanathan, Ravi PY - 2023/5/11 TI - Assessing a Sleep Interviewing Chatbot to Improve Subjective and Objective Sleep: Protocol for an Observational Feasibility Study JO - JMIR Res Protoc SP - e45752 VL - 12 KW - automated chatbot KW - behavior analysis KW - conversational agents KW - older adults KW - sleep disorders KW - sleep interview N2 - Background: Sleep disorders are common among the aging population and people with neurodegenerative diseases. Sleep disorders have a strong bidirectional relationship with neurodegenerative diseases, where they accelerate and worsen one another. Although one-to-one individual cognitive behavioral interventions (conducted in-person or on the internet) have shown promise for significant improvements in sleep efficiency among adults, many may experience difficulties accessing interventions with sleep specialists, psychiatrists, or psychologists. Therefore, delivering sleep intervention through an automated chatbot platform may be an effective strategy to increase the accessibility and reach of sleep disorder intervention among the aging population and people with neurodegenerative diseases. Objective: This work aims to (1) determine the feasibility and usability of an automated chatbot (named MotivSleep) that conducts sleep interviews to encourage the aging population to report behaviors that may affect their sleep, followed by providing personalized recommendations for better sleep based on participants? self-reported behaviors; (2) assess the self-reported sleep assessment changes before, during, and after using our automated sleep disturbance intervention chatbot; (3) assess the changes in objective sleep assessment recorded by a sleep tracking device before, during, and after using the automated chatbot MotivSleep. Methods: We will recruit 30 older adult participants from West London for this pilot study. Each participant will have a sleep analyzer installed under their mattress. This contactless sleep monitoring device passively records movements, heart rate, and breathing rate while participants are in bed. In addition, each participant will use our proposed chatbot MotivSleep, accessible on WhatsApp, to describe their sleep and behaviors related to their sleep and receive personalized recommendations for better sleep tailored to their specific reasons for disrupted sleep. We will analyze questionnaire responses before and after the study to assess their perception of our proposed chatbot; questionnaire responses before, during, and after the study to assess their subjective sleep quality changes; and sleep parameters recorded by the sleep analyzer throughout the study to assess their objective sleep quality changes. Results: Recruitment will begin in May 2023 through UK Dementia Research Institute Care Research and Technology Centre organized community outreach. Data collection will run from May 2023 until December 2023. We hypothesize that participants will perceive our proposed chatbot as intelligent and trustworthy; we also hypothesize that our proposed chatbot can help improve participants? subjective and objective sleep assessment throughout the study. Conclusions: The MotivSleep automated chatbot has the potential to provide additional care to older adults who wish to improve their sleep in more accessible and less costly ways than conventional face-to-face therapy. International Registered Report Identifier (IRRID): PRR1-10.2196/45752 UR - https://www.researchprotocols.org/2023/1/e45752 UR - http://dx.doi.org/10.2196/45752 UR - http://www.ncbi.nlm.nih.gov/pubmed/37166964 ID - info:doi/10.2196/45752 ER - TY - JOUR AU - Trzebi?ski, Wojciech AU - Claessens, Toni AU - Buhmann, Jeska AU - De Waele, Aurélie AU - Hendrickx, Greet AU - Van Damme, Pierre AU - Daelemans, Walter AU - Poels, Karolien PY - 2023/5/8 TI - The Effects of Expressing Empathy/Autonomy Support Using a COVID-19 Vaccination Chatbot: Experimental Study in a Sample of Belgian Adults JO - JMIR Form Res SP - e41148 VL - 7 KW - COVID-19 KW - vaccinations KW - chatbot KW - empathy KW - autonomy support KW - perceived user autonomy KW - chatbot patronage intention KW - vaccination intention KW - conversational agent KW - public health KW - digital health intervention KW - health promotion N2 - Background: Chatbots are increasingly used to support COVID-19 vaccination programs. Their persuasiveness may depend on the conversation-related context. Objective: This study aims to investigate the moderating role of the conversation quality and chatbot expertise cues in the effects of expressing empathy/autonomy support using COVID-19 vaccination chatbots. Methods: This experiment with 196 Dutch-speaking adults living in Belgium, who engaged in a conversation with a chatbot providing vaccination information, used a 2 (empathy/autonomy support expression: present vs absent) × 2 (chatbot expertise cues: expert endorser vs layperson endorser) between-subject design. Chatbot conversation quality was assessed through actual conversation logs. Perceived user autonomy (PUA), chatbot patronage intention (CPI), and vaccination intention shift (VIS) were measured after the conversation, coded from 1 to 5 (PUA, CPI) and from ?5 to 5 (VIS). Results: There was a negative interaction effect of chatbot empathy/autonomy support expression and conversation fallback (CF; the percentage of chatbot answers ?I do not understand? in a conversation) on PUA (PROCESS macro, model 1, B=?3.358, SE 1.235, t186=2.718, P=.007). Specifically, empathy/autonomy support expression had a more negative effect on PUA when the CF was higher (conditional effect of empathy/autonomy support expression at the CF level of +1SD: B=?.405, SE 0.158, t186=2.564, P=.011; conditional effects nonsignificant for the mean level: B=?0.103, SE 0.113, t186=0.914, P=.36; conditional effects nonsignificant for the ?1SD level: B=0.031, SE=0.123, t186=0.252, P=.80). Moreover, an indirect effect of empathy/autonomy support expression on CPI via PUA was more negative when CF was higher (PROCESS macro, model 7, 5000 bootstrap samples, moderated mediation index=?3.676, BootSE 1.614, 95% CI ?6.697 to ?0.102; conditional indirect effect at the CF level of +1SD: B=?0.443, BootSE 0.202, 95% CI ?0.809 to ?0.005; conditional indirect effects nonsignificant for the mean level: B=?0.113, BootSE 0.124, 95% CI ?0.346 to 0.137; conditional indirect effects nonsignificant for the ?1SD level: B=0.034, BootSE 0.132, 95% CI ?0.224 to 0.305). Indirect effects of empathy/autonomy support expression on VIS via PUA were marginally more negative when CF was higher. No effects of chatbot expertise cues were found. Conclusions: The findings suggest that expressing empathy/autonomy support using a chatbot may harm its evaluation and persuasiveness when the chatbot fails to answer its users? questions. The paper adds to the literature on vaccination chatbots by exploring the conditional effects of chatbot empathy/autonomy support expression. The results will guide policy makers and chatbot developers dealing with vaccination promotion in designing the way chatbots express their empathy and support for user autonomy. UR - https://formative.jmir.org/2023/1/e41148 UR - http://dx.doi.org/10.2196/41148 UR - http://www.ncbi.nlm.nih.gov/pubmed/37074978 ID - info:doi/10.2196/41148 ER - TY - JOUR AU - Han, Rui AU - Todd, Allyson AU - Wardak, Sara AU - Partridge, R. Stephanie AU - Raeside, Rebecca PY - 2023/5/5 TI - Feasibility and Acceptability of Chatbots for Nutrition and Physical Activity Health Promotion Among Adolescents: Systematic Scoping Review With Adolescent Consultation JO - JMIR Hum Factors SP - e43227 VL - 10 KW - chatbot KW - artificial intelligence KW - text message KW - adolescent nutrition KW - physical activity KW - health promotion N2 - Background: Reducing lifestyle risk behaviors among adolescents depends on access to age-appropriate health promotion information. Chatbots?computer programs designed to simulate conversations with human users?have the potential to deliver health information to adolescents to improve their lifestyle behaviors and support behavior change, but research on the feasibility and acceptability of chatbots in the adolescent population is unknown. Objective: This systematic scoping review aims to evaluate the feasibility and acceptability of chatbots in nutrition and physical activity interventions among adolescents. A secondary aim is to consult adolescents to identify features of chatbots that are acceptable and feasible. Methods: We searched 6 electronic databases from March to April 2022 (MEDLINE, Embase, Joanna Briggs Institute, the Cumulative Index to Nursing and Allied Health, the Association for Computing Machinery library, and the IT database Institute of Electrical and Electronics Engineers). Peer-reviewed studies were included that were conducted in the adolescent population (10-19 years old) without any chronic disease, except obesity or type 2 diabetes, and assessed chatbots used nutrition or physical activity interventions or both that encouraged individuals to meet dietary or physical activity guidelines and support positive behavior change. Studies were screened by 2 independent reviewers, with any queries resolved by a third reviewer. Data were extracted into tables and collated in a narrative summary. Gray literature searches were also undertaken. Results of the scoping review were presented to a diverse youth advisory group (N=16, 13-18 years old) to gain insights into this topic beyond what is published in the literature. Results: The search identified 5558 papers, with 5 (0.1%) studies describing 5 chatbots meeting the inclusion criteria. The 5 chatbots were supported by mobile apps using a combination of the following features: personalized feedback, conversational agents, gamification, and monitoring of behavior change. Of the 5 studies, 2 (40.0%) studies focused on nutrition, 2 (40.0%) studies focused on physical activity, and 1 (20.0%) focused on both nutrition and physical activity. Feasibility and acceptability varied across the 5 studies, with usage rates above 50% in 3 (60.0%) studies. In addition, 3 (60.0%) studies reported health-related outcomes, with only 1 (20.0%) study showing promising effects of the intervention. Adolescents presented novel concerns around the use of chatbots in nutrition and physical activity interventions, including ethical concerns and the use of false or misleading information. Conclusions: Limited research is available on chatbots in adolescent nutrition and physical activity interventions, finding insufficient evidence on the acceptability and feasibility of chatbots in the adolescent population. Similarly, adolescent consultation identified issues in the design features that have not been mentioned in the published literature. Therefore, chatbot codesign with adolescents may help ensure that such technology is feasible and acceptable to an adolescent population. UR - https://humanfactors.jmir.org/2023/1/e43227 UR - http://dx.doi.org/10.2196/43227 UR - http://www.ncbi.nlm.nih.gov/pubmed/37145858 ID - info:doi/10.2196/43227 ER - TY - JOUR AU - He, Yuhao AU - Yang, Li AU - Qian, Chunlian AU - Li, Tong AU - Su, Zhengyuan AU - Zhang, Qiang AU - Hou, Xiangqing PY - 2023/4/28 TI - Conversational Agent Interventions for Mental Health Problems: Systematic Review and Meta-analysis of Randomized Controlled Trials JO - J Med Internet Res SP - e43862 VL - 25 KW - chatbot and conversational agent KW - mental health KW - meta-analysis KW - depression KW - anxiety KW - quality of life KW - stress KW - mobile health KW - mHealth KW - digital medicine KW - meta-regression KW - mobile phone N2 - Background: Mental health problems are a crucial global public health concern. Owing to their cost-effectiveness and accessibility, conversational agent interventions (CAIs) are promising in the field of mental health care. Objective: This study aims to present a thorough summary of the traits of CAIs available for a range of mental health problems, find evidence of efficacy, and analyze the statistically significant moderators of efficacy via a meta-analysis of randomized controlled trial. Methods: Web-based databases (Embase, MEDLINE, PsycINFO, CINAHL, Web of Science, and Cochrane) were systematically searched dated from the establishment of the database to October 30, 2021, and updated to May 1, 2022. Randomized controlled trials comparing CAIs with any other type of control condition in improving depressive symptoms, generalized anxiety symptoms, specific anxiety symptoms, quality of life or well-being, general distress, stress, mental disorder symptoms, psychosomatic disease symptoms, and positive and negative affect were considered eligible. This study followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Data were extracted by 2 independent reviewers, checked by a third reviewer, and pooled using both random effect models and fixed effects models. Hedges g was chosen as the effect size. Results: Of the 6900 identified records, a total of 32 studies were included, involving 6089 participants. CAIs showed statistically significant short-term effects compared with control conditions in improving depressive symptoms (g=0.29, 95% CI 0.20-0.38), generalized anxiety symptoms (g=0.29, 95% CI 0.21-0.36), specific anxiety symptoms (g=0.47, 95% CI 0.07-0.86), quality of life or well-being (g=0.27, 95% CI 0.16-0.39), general distress (g=0.33, 95% CI 0.20-0.45), stress (g=0.24, 95% CI 0.08-0.41), mental disorder symptoms (g=0.36, 95% CI 0.17-0.54), psychosomatic disease symptoms (g=0.62, 95% CI 0.14-1.11), and negative affect (g=0.28, 95% CI 0.05-0.51). However, the long-term effects of CAIs for the most mental health outcomes were not statistically significant (g=?0.04 to 0.39). Personalization and empathic response were 2 critical facilitators of efficacy. The longer duration of interaction with conversational agents was associated with the larger pooled effect sizes. Conclusions: The findings show that CAIs are research-proven interventions that ought to be implemented more widely in mental health care. CAIs are effective and easily acceptable for those with mental health problems. The clinical application of this novel digital technology will conserve human health resources and optimize the allocation of mental health services. Trial Registration: PROSPERO CRD42022350130; https://tinyurl.com/mvhk6w9p UR - https://www.jmir.org/2023/1/e43862 UR - http://dx.doi.org/10.2196/43862 UR - http://www.ncbi.nlm.nih.gov/pubmed/37115595 ID - info:doi/10.2196/43862 ER - TY - JOUR AU - Tanaka, Hiroki AU - Saga, Takeshi AU - Iwauchi, Kota AU - Honda, Masato AU - Morimoto, Tsubasa AU - Matsuda, Yasuhiro AU - Uratani, Mitsuhiro AU - Okazaki, Kosuke AU - Nakamura, Satoshi PY - 2023/4/27 TI - The Validation of Automated Social Skills Training in Members of the General Population Over 4 Weeks: Comparative Study JO - JMIR Form Res SP - e44857 VL - 7 KW - social skills training KW - conversational agents KW - role-play KW - feedback KW - multimodal KW - long-term training effects N2 - Background: Social skills training by human trainers is a well-established method of teaching appropriate social and communication skills and strengthening social self-efficacy. Specifically, human social skills training is a fundamental approach to teaching and learning the rules of social interaction. However, it is cost-ineffective and offers low accessibility, since the number of professional trainers is limited. A conversational agent is a system that can communicate with a human being in a natural language. We proposed to overcome the limitations of current social skills training with conversational agents. Our system is capable of speech recognition, response selection, and speech synthesis and can also generate nonverbal behaviors. We developed a system that incorporated automated social skills training that completely adheres to the training model of Bellack et al through a conversational agent. Objective: This study aimed to validate the training effect of a conversational agent?based social skills training system in members of the general population during a 4-week training session. We compare 2 groups (with and without training) and hypothesize that the trained group?s social skills will improve. Furthermore, this study sought to clarify the effect size for future larger-scale evaluations, including a much larger group of different social pathological phenomena. Methods: For the experiment, 26 healthy Japanese participants were separated into 2 groups, where we hypothesized that group 1 (system trained) will make greater improvement than group 2 (nontrained). System training was done as a 4-week intervention where the participants visit the examination room every week. Each training session included social skills training with a conversational agent for 3 basic skills. We evaluated the training effect using questionnaires in pre- and posttraining evaluations. In addition to the questionnaires, we conducted a performance test that required the social cognition and expression of participants in new role-play scenarios. Blind ratings by third-party trainers were made by watching recorded role-play videos. A nonparametric Wilcoxson Rank Sum test was performed for each variable. Improvement between pre- and posttraining evaluations was used to compare the 2 groups. Moreover, we compared the statistical significance from the questionnaires and ratings between the 2 groups. Results: Of the 26 recruited participants, 18 completed this experiment: 9 in group 1 and 9 in group 2. Those in group 1 achieved significant improvement in generalized self-efficacy (P=.02; effect size r=0.53). We also found a significant decrease in state anxiety presence (P=.04; r=0.49), measured by the State-Trait Anxiety Inventory (STAI). For ratings by third-party trainers, speech clarity was significantly strengthened in group 1 (P=.03; r=0.30). Conclusions: Our findings reveal the usefulness of the automated social skills training after a 4-week training period. This study confirms a large effect size between groups on generalized self-efficacy, state anxiety presence, and speech clarity. UR - https://formative.jmir.org/2023/1/e44857 UR - http://dx.doi.org/10.2196/44857 UR - http://www.ncbi.nlm.nih.gov/pubmed/37103996 ID - info:doi/10.2196/44857 ER - TY - JOUR AU - Thirunavukarasu, James Arun AU - Hassan, Refaat AU - Mahmood, Shathar AU - Sanghera, Rohan AU - Barzangi, Kara AU - El Mukashfi, Mohanned AU - Shah, Sachin PY - 2023/4/21 TI - Trialling a Large Language Model (ChatGPT) in General Practice With the Applied Knowledge Test: Observational Study Demonstrating Opportunities and Limitations in Primary Care JO - JMIR Med Educ SP - e46599 VL - 9 KW - ChatGPT KW - large language model KW - natural language processing KW - decision support techniques KW - artificial intelligence KW - AI KW - deep learning KW - primary care KW - general practice KW - family medicine KW - chatbot N2 - Background: Large language models exhibiting human-level performance in specialized tasks are emerging; examples include Generative Pretrained Transformer 3.5, which underlies the processing of ChatGPT. Rigorous trials are required to understand the capabilities of emerging technology, so that innovation can be directed to benefit patients and practitioners. Objective: Here, we evaluated the strengths and weaknesses of ChatGPT in primary care using the Membership of the Royal College of General Practitioners Applied Knowledge Test (AKT) as a medium. Methods: AKT questions were sourced from a web-based question bank and 2 AKT practice papers. In total, 674 unique AKT questions were inputted to ChatGPT, with the model?s answers recorded and compared to correct answers provided by the Royal College of General Practitioners. Each question was inputted twice in separate ChatGPT sessions, with answers on repeated trials compared to gauge consistency. Subject difficulty was gauged by referring to examiners? reports from 2018 to 2022. Novel explanations from ChatGPT?defined as information provided that was not inputted within the question or multiple answer choices?were recorded. Performance was analyzed with respect to subject, difficulty, question source, and novel model outputs to explore ChatGPT?s strengths and weaknesses. Results: Average overall performance of ChatGPT was 60.17%, which is below the mean passing mark in the last 2 years (70.42%). Accuracy differed between sources (P=.04 and .06). ChatGPT?s performance varied with subject category (P=.02 and .02), but variation did not correlate with difficulty (Spearman ?=?0.241 and ?0.238; P=.19 and .20). The proclivity of ChatGPT to provide novel explanations did not affect accuracy (P>.99 and .23). Conclusions: Large language models are approaching human expert?level performance, although further development is required to match the performance of qualified primary care physicians in the AKT. Validated high-performance models may serve as assistants or autonomous clinical tools to ameliorate the general practice workforce crisis. UR - https://mededu.jmir.org/2023/1/e46599 UR - http://dx.doi.org/10.2196/46599 UR - http://www.ncbi.nlm.nih.gov/pubmed/37083633 ID - info:doi/10.2196/46599 ER - TY - JOUR AU - Jabir, Ishqi Ahmad AU - Martinengo, Laura AU - Lin, Xiaowen AU - Torous, John AU - Subramaniam, Mythily AU - Tudor Car, Lorainne PY - 2023/4/19 TI - Evaluating Conversational Agents for Mental Health: Scoping Review of Outcomes and Outcome Measurement Instruments JO - J Med Internet Res SP - e44548 VL - 25 KW - conversational agent KW - chatbot KW - mental health KW - mHealth KW - mobile health KW - taxonomy KW - outcomes KW - core outcome set N2 - Background: Rapid proliferation of mental health interventions delivered through conversational agents (CAs) calls for high-quality evidence to support their implementation and adoption. Selecting appropriate outcomes, instruments for measuring outcomes, and assessment methods are crucial for ensuring that interventions are evaluated effectively and with a high level of quality. Objective: We aimed to identify the types of outcomes, outcome measurement instruments, and assessment methods used to assess the clinical, user experience, and technical outcomes in studies that evaluated the effectiveness of CA interventions for mental health. Methods: We undertook a scoping review of the relevant literature to review the types of outcomes, outcome measurement instruments, and assessment methods in studies that evaluated the effectiveness of CA interventions for mental health. We performed a comprehensive search of electronic databases, including PubMed, Cochrane Central Register of Controlled Trials, Embase (Ovid), PsychINFO, and Web of Science, as well as Google Scholar and Google. We included experimental studies evaluating CA mental health interventions. The screening and data extraction were performed independently by 2 review authors in parallel. Descriptive and thematic analyses of the findings were performed. Results: We included 32 studies that targeted the promotion of mental well-being (17/32, 53%) and the treatment and monitoring of mental health symptoms (21/32, 66%). The studies reported 203 outcome measurement instruments used to measure clinical outcomes (123/203, 60.6%), user experience outcomes (75/203, 36.9%), technical outcomes (2/203, 1.0%), and other outcomes (3/203, 1.5%). Most of the outcome measurement instruments were used in only 1 study (150/203, 73.9%) and were self-reported questionnaires (170/203, 83.7%), and most were delivered electronically via survey platforms (61/203, 30.0%). No validity evidence was cited for more than half of the outcome measurement instruments (107/203, 52.7%), which were largely created or adapted for the study in which they were used (95/107, 88.8%). Conclusions: The diversity of outcomes and the choice of outcome measurement instruments employed in studies on CAs for mental health point to the need for an established minimum core outcome set and greater use of validated instruments. Future studies should also capitalize on the affordances made available by CAs and smartphones to streamline the evaluation and reduce participants? input burden inherent to self-reporting. UR - https://www.jmir.org/2023/1/e44548 UR - http://dx.doi.org/10.2196/44548 UR - http://www.ncbi.nlm.nih.gov/pubmed/37074762 ID - info:doi/10.2196/44548 ER - TY - JOUR AU - Chagas, Azevedo Bruno AU - Pagano, Silvina Adriana AU - Prates, Oliveira Raquel AU - Praes, Cordeiro Elisa AU - Ferreguetti, Kícila AU - Vaz, Helena AU - Reis, Nogueira Zilma Silveira AU - Ribeiro, Bonisson Leonardo AU - Ribeiro, Pinho Antonio Luiz AU - Pedroso, Marques Thais AU - Beleigoli, Alline AU - Oliveira, Alves Clara Rodrigues AU - Marcolino, Soriano Milena PY - 2023/4/3 TI - Evaluating User Experience With a Chatbot Designed as a Public Health Response to the COVID-19 Pandemic in Brazil: Mixed Methods Study JO - JMIR Hum Factors SP - e43135 VL - 10 KW - user experience KW - chatbots KW - telehealth KW - COVID-19 KW - human-computer interaction KW - HCI KW - empirical studies in human-computer interaction KW - empirical studies in HCI KW - health care information systems N2 - Background: The potential of chatbots for screening and monitoring COVID-19 was envisioned since the outbreak of the disease. Chatbots can help disseminate up-to-date and trustworthy information, promote healthy social behavior, and support the provision of health care services safely and at scale. In this scenario and in view of its far-reaching postpandemic impact, it is important to evaluate user experience with this kind of application. Objective: We aimed to evaluate the quality of user experience with a COVID-19 chatbot designed by a large telehealth service in Brazil, focusing on the usability of real users and the exploration of strengths and shortcomings of the chatbot, as revealed in reports by participants in simulated scenarios. Methods: We examined a chatbot developed by a multidisciplinary team and used it as a component within the workflow of a local public health care service. The chatbot had 2 core functionalities: assisting web-based screening of COVID-19 symptom severity and providing evidence-based information to the population. From October 2020 to January 2021, we conducted a mixed methods approach and performed a 2-fold evaluation of user experience with our chatbot by following 2 methods: a posttask usability Likert-scale survey presented to all users after concluding their interaction with the bot and an interview with volunteer participants who engaged in a simulated interaction with the bot guided by the interviewer. Results: Usability assessment with 63 users revealed very good scores for chatbot usefulness (4.57), likelihood of being recommended (4.48), ease of use (4.44), and user satisfaction (4.38). Interviews with 15 volunteers provided insights into the strengths and shortcomings of our bot. Comments on the positive aspects and problems reported by users were analyzed in terms of recurrent themes. We identified 6 positive aspects and 15 issues organized in 2 categories: usability of the chatbot and health support offered by it, the former referring to usability of the chatbot and how users can interact with it and the latter referring to the chatbot?s goal in supporting people during the pandemic through the screening process and education to users through informative content. We found 6 themes accounting for what people liked most about our chatbot and why they found it useful?3 themes pertaining to the usability domain and 3 themes regarding health support. Our findings also identified 15 types of problems producing a negative impact on users?10 of them related to the usability of the chatbot and 5 related to the health support it provides. Conclusions: Our results indicate that users had an overall positive experience with the chatbot and found the health support relevant. Nonetheless, qualitative evaluation of the chatbot indicated challenges and directions to be pursued in improving not only our COVID-19 chatbot but also health chatbots in general. UR - https://humanfactors.jmir.org/2023/1/e43135 UR - http://dx.doi.org/10.2196/43135 UR - http://www.ncbi.nlm.nih.gov/pubmed/36634267 ID - info:doi/10.2196/43135 ER - TY - JOUR AU - Millard, C. Louise A. AU - Johnson, Laura AU - Neaves, R. Samuel AU - Flach, A. Peter AU - Tilling, Kate AU - Lawlor, A. Deborah PY - 2023/3/31 TI - Collecting Food and Drink Intake Data With Voice Input: Development, Usability, and Acceptability Study JO - JMIR Mhealth Uhealth SP - e41117 VL - 11 KW - digital health KW - data collection KW - voice-based approaches KW - Amazon Alexa KW - self-reported data KW - food and drink N2 - Background: Voice-based systems such as Amazon Alexa may be useful for collecting self-reported information in real time from participants of epidemiology studies using verbal input. In epidemiological research studies, self-reported data tend to be collected using short, infrequent questionnaires, in which the items require participants to select from predefined options, which may lead to errors in the information collected and lack of coverage. Voice-based systems give the potential to collect self-reported information ?continuously? over several days or weeks. At present, to the best of our knowledge, voice-based systems have not been used or evaluated for collecting epidemiological data. Objective: We aimed to demonstrate the technical feasibility of using Alexa to collect information from participants, investigate participant acceptability, and provide an initial evaluation of the validity of the collected data. We used food and drink information as an exemplar. Methods: We recruited 45 staff members and students at the University of Bristol (United Kingdom). Participants were asked to tell Alexa what they ate or drank for 7 days and to also submit this information using a web-based form. Questionnaires asked for basic demographic information, about their experience during the study, and the acceptability of using Alexa. Results: Of the 37 participants with valid data, most (n=30, 81%) were aged 20 to 39 years and 23 (62%) were female. Across 29 participants with Alexa and web entries corresponding to the same intake event, 60.1% (357/588) of Alexa entries contained the same food and drink information as the corresponding web entry. Most participants reported that Alexa interjected, and this was worse when entering the food and drink information (17/35, 49% of participants said this happened often; 1/35, 3% said this happened always) than when entering the event date and time (6/35, 17% of participants said this happened often; 1/35, 3% said this happened always). Most (28/35, 80%) said they would be happy to use a voice-controlled system for future research. Conclusions: Although there were some issues interacting with the Alexa skill, largely because of its conversational nature and because Alexa interjected if there was a pause in speech, participants were mostly willing to participate in future research studies using Alexa. More studies are needed, especially to trial less conversational interfaces. UR - https://mhealth.jmir.org/2023/1/e41117 UR - http://dx.doi.org/10.2196/41117 UR - http://www.ncbi.nlm.nih.gov/pubmed/37000476 ID - info:doi/10.2196/41117 ER - TY - JOUR AU - Nair, S. Uma AU - Greene, Karah AU - Marhefka, Stephanie AU - Kosyluk, Kristin AU - Galea, T. Jerome PY - 2023/3/31 TI - Development of a Conversational Agent for Individuals Ambivalent About Quitting Smoking: Protocol for a Proof-of-Concept Study JO - JMIR Res Protoc SP - e44041 VL - 12 KW - cigarettes KW - conversational agent KW - mhealth KW - smoking cessation N2 - Background: Cigarette smoking is the leading preventable cause of disease and death in the United States. Despite the availability of a plethora of evidence-based smoking cessation resources, less than one-third of individuals who smoke seek cessation services, and individuals using these services are often those who are actively contemplating quitting smoking. There is a distinct dearth of low-cost, scalable interventions to support smokers not ready to quit (ambivalent smokers). Such interventions can assist in gradually promoting smoking behavior changes in this target population until motivation to quit arises, at which time they can be navigated to existing evidence-based smoking cessation interventions. Conversational agents or chatbots could provide cessation education and support to ambivalent smokers to build motivation and navigate them to evidence-based resources when ready to quit. Objective: The goal of our study is to test the proof-of-concept of the development and preliminary feasibility and acceptability of a smoking cessation support chatbot. Methods: We will accomplish our study aims in 2 phases. In phase 1, we will survey 300 ambivalent smokers to determine their preferences and priorities for a smoking cessation support chatbot. A ?forced-choice experiment? will be administered to understand participants? preferred characteristics (attributes) of the proposed chatbot prototype. The data gathered will be used to program the prototype. In phase 2, we will invite 25 individuals who smoke to use the developed prototype. For this phase, participants will receive an overview of the chatbot and be encouraged to use the chatbot and engage and interact with the programmed attributes and components for a 2-week period. Results: At the end of phase 1, we anticipate identifying key attributes that ambivalent smokers prefer in a smoking cessation support chatbot. At the end of phase 2, chatbot acceptability and feasibility will be assessed. The study was funded in June 2022, and data collection for both phases of the study is currently ongoing. We expect study results to be published by December 2023. Conclusions: Study results will yield a smoking behavior change chatbot prototype developed for ambivalent smokers that will be ready for efficacy testing in a larger study. International Registered Report Identifier (IRRID): DERR1-10.2196/44041 UR - https://www.researchprotocols.org/2023/1/e44041 UR - http://dx.doi.org/10.2196/44041 UR - http://www.ncbi.nlm.nih.gov/pubmed/37000505 ID - info:doi/10.2196/44041 ER - TY - JOUR AU - Sabry Abdel-Messih, Mary AU - Kamel Boulos, N. Maged PY - 2023/3/8 TI - ChatGPT in Clinical Toxicology JO - JMIR Med Educ SP - e46876 VL - 9 KW - ChatGPT KW - clinical toxicology KW - organophosphates KW - artificial intelligence KW - AI KW - medical education UR - https://mededu.jmir.org/2023/1/e46876 UR - http://dx.doi.org/10.2196/46876 UR - http://www.ncbi.nlm.nih.gov/pubmed/36867743 ID - info:doi/10.2196/46876 ER - TY - JOUR AU - Eysenbach, Gunther PY - 2023/3/6 TI - The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers JO - JMIR Med Educ SP - e46885 VL - 9 KW - artificial intelligence KW - AI KW - ChatGPT KW - generative language model KW - medical education KW - interview KW - future of education UR - https://mededu.jmir.org/2023/1/e46885 UR - http://dx.doi.org/10.2196/46885 UR - http://www.ncbi.nlm.nih.gov/pubmed/36863937 ID - info:doi/10.2196/46885 ER - TY - JOUR AU - Aggarwal, Abhishek AU - Tam, Chi Cheuk AU - Wu, Dezhi AU - Li, Xiaoming AU - Qiao, Shan PY - 2023/2/24 TI - Artificial Intelligence?Based Chatbots for Promoting Health Behavioral Changes: Systematic Review JO - J Med Internet Res SP - e40789 VL - 25 KW - chatbot KW - artificial intelligence KW - AI KW - health behavior change KW - engagement KW - efficacy KW - intervention KW - feasibility KW - usability KW - acceptability KW - mobile phone N2 - Background: Artificial intelligence (AI)?based chatbots can offer personalized, engaging, and on-demand health promotion interventions. Objective: The aim of this systematic review was to evaluate the feasibility, efficacy, and intervention characteristics of AI chatbots for promoting health behavior change. Methods: A comprehensive search was conducted in 7 bibliographic databases (PubMed, IEEE Xplore, ACM Digital Library, PsycINFO, Web of Science, Embase, and JMIR publications) for empirical articles published from 1980 to 2022 that evaluated the feasibility or efficacy of AI chatbots for behavior change. The screening, extraction, and analysis of the identified articles were performed by following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Results: Of the 15 included studies, several demonstrated the high efficacy of AI chatbots in promoting healthy lifestyles (n=6, 40%), smoking cessation (n=4, 27%), treatment or medication adherence (n=2, 13%), and reduction in substance misuse (n=1, 7%). However, there were mixed results regarding feasibility, acceptability, and usability. Selected behavior change theories and expert consultation were used to develop the behavior change strategies of AI chatbots, including goal setting, monitoring, real-time reinforcement or feedback, and on-demand support. Real-time user-chatbot interaction data, such as user preferences and behavioral performance, were collected on the chatbot platform to identify ways of providing personalized services. The AI chatbots demonstrated potential for scalability by deployment through accessible devices and platforms (eg, smartphones and Facebook Messenger). The participants also reported that AI chatbots offered a nonjudgmental space for communicating sensitive information. However, the reported results need to be interpreted with caution because of the moderate to high risk of internal validity, insufficient description of AI techniques, and limitation for generalizability. Conclusions: AI chatbots have demonstrated the efficacy of health behavior change interventions among large and diverse populations; however, future studies need to adopt robust randomized control trials to establish definitive conclusions. UR - https://www.jmir.org/2023/1/e40789 UR - http://dx.doi.org/10.2196/40789 UR - http://www.ncbi.nlm.nih.gov/pubmed/36826990 ID - info:doi/10.2196/40789 ER - TY - JOUR AU - Gilson, Aidan AU - Safranek, W. Conrad AU - Huang, Thomas AU - Socrates, Vimig AU - Chi, Ling AU - Taylor, Andrew Richard AU - Chartash, David PY - 2023/2/8 TI - How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment JO - JMIR Med Educ SP - e45312 VL - 9 KW - natural language processing KW - NLP KW - MedQA KW - generative pre-trained transformer KW - GPT KW - medical education KW - chatbot KW - artificial intelligence KW - education technology KW - ChatGPT KW - conversational agent KW - machine learning KW - USMLE N2 - Background: Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective: This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods: We used 2 sets of multiple-choice questions to evaluate ChatGPT?s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT?s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results: Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT?s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001) and NBME-Free-Step2 (P=.001) data sets, respectively. Conclusions: ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT?s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning. UR - https://mededu.jmir.org/2023/1/e45312 UR - http://dx.doi.org/10.2196/45312 UR - http://www.ncbi.nlm.nih.gov/pubmed/36753318 ID - info:doi/10.2196/45312 ER - TY - JOUR AU - Calvo, A. Rafael AU - Peters, Dorian AU - Moradbakhti, Laura AU - Cook, Darren AU - Rizos, Georgios AU - Schuller, Bjoern AU - Kallis, Constantinos AU - Wong, Ernie AU - Quint, Jennifer PY - 2023/2/2 TI - Assessing the Feasibility of a Text-Based Conversational Agent for Asthma Support: Protocol for a Mixed Methods Observational Study JO - JMIR Res Protoc SP - e42965 VL - 12 KW - conversational agent KW - chatbot KW - health KW - well-being KW - artificial intelligence KW - health education KW - behavior change KW - asthma N2 - Background: Despite efforts, the UK death rate from asthma is the highest in Europe, and 65% of people with asthma in the United Kingdom do not receive the professional care they are entitled to. Experts have recommended the use of digital innovations to help address the issues of poor outcomes and lack of care access. An automated SMS text messaging?based conversational agent (ie, chatbot) created to provide access to asthma support in a familiar format via a mobile phone has the potential to help people with asthma across demographics and at scale. Such a chatbot could help improve the accuracy of self-assessed risk, improve asthma self-management, increase access to professional care, and ultimately reduce asthma attacks and emergencies. Objective: The aims of this study are to determine the feasibility and usability of a text-based conversational agent that processes a patient?s text responses and short sample voice recordings to calculate an estimate of their risk for an asthma exacerbation and then offers follow-up information for lowering risk and improving asthma control; assess the levels of engagement for different groups of users, particularly those who do not access professional services and those with poor asthma control; and assess the extent to which users of the chatbot perceive it as helpful for improving their understanding and self-management of their condition. Methods: We will recruit 300 adults through four channels for broad reach: Facebook, YouGov, Asthma + Lung UK social media, and the website Healthily (a health self-management app). Participants will be screened, and those who meet inclusion criteria (adults diagnosed with asthma and who use WhatsApp) will be provided with a link to access the conversational agent through WhatsApp on their mobile phones. Participants will be sent scheduled and randomly timed messages to invite them to engage in dialogue about their asthma risk during the period of study. After a data collection period (28 days), participants will respond to questionnaire items related to the quality of the interaction. A pre- and postquestionnaire will measure asthma control before and after the intervention. Results: This study was funded in March 2021 and started in January 2022. We developed a prototype conversational agent, which was iteratively improved with feedback from people with asthma, asthma nurses, and specialist doctors. Fortnightly reviews of iterations by the clinical team began in September 2022 and are ongoing. This feasibility study will start recruitment in January 2023. The anticipated completion of the study is July 2023. A future randomized controlled trial will depend on the outcomes of this study and funding. Conclusions: This feasibility study will inform a follow-up pilot and larger randomized controlled trial to assess the impact of a conversational agent on asthma outcomes, self-management, behavior change, and access to care. International Registered Report Identifier (IRRID): PRR1-10.2196/42965 UR - https://www.researchprotocols.org/2023/1/e42965 UR - http://dx.doi.org/10.2196/42965 UR - http://www.ncbi.nlm.nih.gov/pubmed/36729586 ID - info:doi/10.2196/42965 ER - TY - JOUR AU - Biro, Joshua AU - Linder, Courtney AU - Neyens, David PY - 2023/2/1 TI - The Effects of a Health Care Chatbot?s Complexity and Persona on User Trust, Perceived Usability, and Effectiveness: Mixed Methods Study JO - JMIR Hum Factors SP - e41017 VL - 10 KW - electronic health record KW - EHR KW - health information KW - health education KW - patient education KW - chatbot KW - virtual agent KW - virtual assistant KW - usability KW - trust KW - adoption KW - artificial intelligence KW - effectiveness N2 - Background: The rising adoption of telehealth provides new opportunities for more effective and equitable health care information mediums. The ability of chatbots to provide a conversational, personal, and comprehendible avenue for learning about health care information make them a promising tool for addressing health care inequity as health care trends continue toward web-based and remote processes. Although chatbots have been studied in the health care domain for their efficacy for smoking cessation, diet recommendation, and other assistive applications, few studies have examined how specific design characteristics influence the effectiveness of chatbots in providing health information. Objective: Our objective was to investigate the influence of different design considerations on the effectiveness of an educational health care chatbot. Methods: A 2×3 between-subjects study was performed with 2 independent variables: a chatbot?s complexity of responses (eg, technical or nontechnical language) and the presented qualifications of the chatbot?s persona (eg, doctor, nurse, or nursing student). Regression models were used to evaluate the impact of these variables on 3 outcome measures: effectiveness, usability, and trust. A qualitative transcript review was also done to review how participants engaged with the chatbot. Results: Analysis of 71 participants found that participants who received technical language responses were significantly more likely to be in the high effectiveness group, which had higher improvements in test scores (odds ratio [OR] 2.73, 95% CI 1.05-7.41; P=.04). Participants with higher health literacy (OR 2.04, 95% CI 1.11-4.00, P=.03) were significantly more likely to trust the chatbot. The participants engaged with the chatbot in a variety of ways, with some taking a conversational approach and others treating the chatbot more like a search engine. Conclusions: Given their increasing popularity, it is vital that we consider how chatbots are designed and implemented. This study showed that factors such as chatbots? persona and language complexity are two design considerations that influence the ability of chatbots to successfully provide health care information. UR - https://humanfactors.jmir.org/2023/1/e41017 UR - http://dx.doi.org/10.2196/41017 UR - http://www.ncbi.nlm.nih.gov/pubmed/36724004 ID - info:doi/10.2196/41017 ER - TY - JOUR AU - Denecke, Kerstin AU - May, Richard PY - 2023/1/30 TI - Developing a Technical-Oriented Taxonomy to Define Archetypes of Conversational Agents in Health Care: Literature Review and Cluster Analysis JO - J Med Internet Res SP - e41583 VL - 25 KW - mobile phone KW - user-computer interface KW - telemedicine KW - communication KW - delivery of health care and methods KW - delivery of health care and trends N2 - Background: The evolution of artificial intelligence and natural language processing generates new opportunities for conversational agents (CAs) that communicate and interact with individuals. In the health domain, CAs became popular as they allow for simulating the real-life experience in a health care setting, which is the conversation with a physician. However, it is still unclear which technical archetypes of health CAs can be distinguished. Such technical archetypes are required, among other things, for harmonizing evaluation metrics or describing the landscape of health CAs. Objective: The objective of this work was to develop a technical-oriented taxonomy for health CAs and characterize archetypes of health CAs based on their technical characteristics. Methods: We developed a taxonomy of technical characteristics for health CAs based on scientific literature and empirical data and by applying a taxonomy development framework. To demonstrate the applicability of the taxonomy, we analyzed the landscape of health CAs of the last years based on a literature review. To form technical design archetypes of health CAs, we applied a k-means clustering method. Results: Our taxonomy comprises 18 unique dimensions corresponding to 4 perspectives of technical characteristics (setting, data processing, interaction, and agent appearance). Each dimension consists of 2 to 5 characteristics. The taxonomy was validated based on 173 unique health CAs that were identified out of 1671 initially retrieved publications. The 173 CAs were clustered into 4 distinctive archetypes: a text-based ad hoc supporter; a multilingual, hybrid ad hoc supporter; a hybrid, single-language temporary advisor; and, finally, an embodied temporary advisor, rule based with hybrid input and output options. Conclusions: From the cluster analysis, we learned that the time dimension is important from a technical perspective to distinguish health CA archetypes. Moreover, we were able to identify additional distinctive, dominant characteristics that are relevant when evaluating health-related CAs (eg, input and output options or the complexity of the CA personality). Our archetypes reflect the current landscape of health CAs, which is characterized by rule based, simple systems in terms of CA personality and interaction. With an increase in research interest in this field, we expect that more complex systems will arise. The archetype-building process should be repeated after some time to check whether new design archetypes emerge. UR - https://www.jmir.org/2023/1/e41583 UR - http://dx.doi.org/10.2196/41583 UR - http://www.ncbi.nlm.nih.gov/pubmed/36716093 ID - info:doi/10.2196/41583 ER - TY - JOUR AU - Weeks, Rose AU - Sangha, Pooja AU - Cooper, Lyra AU - Sedoc, João AU - White, Sydney AU - Gretz, Shai AU - Toledo, Assaf AU - Lahav, Dan AU - Hartner, Anna-Maria AU - Martin, M. Nina AU - Lee, Hyoung Jae AU - Slonim, Noam AU - Bar-Zeev, Naor PY - 2023/1/30 TI - Usability and Credibility of a COVID-19 Vaccine Chatbot for Young Adults and Health Workers in the United States: Formative Mixed Methods Study JO - JMIR Hum Factors SP - e40533 VL - 10 KW - COVID-19 KW - chatbot development KW - risk communication KW - vaccine hesitancy KW - conversational agent KW - health information KW - chatbot KW - natural language processing KW - usability KW - user feedback N2 - Background: The COVID-19 pandemic raised novel challenges in communicating reliable, continually changing health information to a broad and sometimes skeptical public, particularly around COVID-19 vaccines, which, despite being comprehensively studied, were the subject of viral misinformation. Chatbots are a promising technology to reach and engage populations during the pandemic. To inform and communicate effectively with users, chatbots must be highly usable and credible. Objective: We sought to understand how young adults and health workers in the United States assessed the usability and credibility of a web-based chatbot called Vira, created by the Johns Hopkins Bloomberg School of Public Health and IBM Research using natural language processing technology. Using a mixed method approach, we sought to rapidly improve Vira?s user experience to support vaccine decision-making during the peak of the COVID-19 pandemic. Methods: We recruited racially and ethnically diverse young people and health workers, with both groups from urban areas of the United States. We used the validated Chatbot Usability Questionnaire to understand the tool?s navigation, precision, and persona. We also conducted 11 interviews with health workers and young people to understand the user experience, whether they perceived the chatbot as confidential and trustworthy, and how they would use the chatbot. We coded and categorized emerging themes to understand the determining factors for participants? assessment of chatbot usability and credibility. Results: In all, 58 participants completed a web-based usability questionnaire and 11 completed in-depth interviews. Most questionnaire respondents said the chatbot was ?easy to navigate? (51/58, 88%) and ?very easy to use? (50/58, 86%), and many (45/58, 78%) said its responses were relevant. The mean Chatbot Usability Questionnaire score was 70.2 (SD 12.1) and scores ranged from 40.6 to 95.3. Interview participants felt the chatbot achieved high usability due to its strong functionality, performance, and perceived confidentiality and that the chatbot could attain high credibility with a redesign of its cartoonish visual persona. Young people said they would use the chatbot to discuss vaccination with hesitant friends or family members, whereas health workers used or anticipated using the chatbot to support community outreach, save time, and stay up to date. Conclusions: This formative study conducted during the pandemic?s peak provided user feedback for an iterative redesign of Vira. Using a mixed method approach provided multidimensional feedback, identifying how the chatbot worked well?being easy to use, answering questions appropriately, and using credible branding?while offering tangible steps to improve the product?s visual design. Future studies should evaluate how chatbots support personal health decision-making, particularly in the context of a public health emergency, and whether such outreach tools can reduce staff burnout. Randomized studies should also be conducted to measure how chatbots countering health misinformation affect user knowledge, attitudes, and behavior. UR - https://humanfactors.jmir.org/2023/1/e40533 UR - http://dx.doi.org/10.2196/40533 UR - http://www.ncbi.nlm.nih.gov/pubmed/36409300 ID - info:doi/10.2196/40533 ER - TY - JOUR AU - Chrimes, Dillon PY - 2023/1/30 TI - Using Decision Trees as an Expert System for Clinical Decision Support for COVID-19 JO - Interact J Med Res SP - e42540 VL - 12 KW - assessment tool KW - chatbot KW - clinical decision support KW - COVID-19 KW - decision tree KW - digital health tool KW - framework KW - health informatics KW - health intervention KW - prototype UR - https://www.i-jmr.org/2023/1/e42540 UR - http://dx.doi.org/10.2196/42540 UR - http://www.ncbi.nlm.nih.gov/pubmed/36645840 ID - info:doi/10.2196/42540 ER - TY - JOUR AU - Chin, Hyojin AU - Lima, Gabriel AU - Shin, Mingi AU - Zhunis, Assem AU - Cha, Chiyoung AU - Choi, Junghoi AU - Cha, Meeyoung PY - 2023/1/27 TI - User-Chatbot Conversations During the COVID-19 Pandemic: Study Based on Topic Modeling and Sentiment Analysis JO - J Med Internet Res SP - e40922 VL - 25 KW - chatbot KW - COVID-19 KW - topic modeling KW - sentiment analysis KW - infodemiology KW - discourse KW - public perception KW - public health KW - infoveillance KW - conversational agent KW - global health KW - health information N2 - Background: Chatbots have become a promising tool to support public health initiatives. Despite their potential, little research has examined how individuals interacted with chatbots during the COVID-19 pandemic. Understanding user-chatbot interactions is crucial for developing services that can respond to people?s needs during a global health emergency. Objective: This study examined the COVID-19 pandemic?related topics online users discussed with a commercially available social chatbot and compared the sentiment expressed by users from 5 culturally different countries. Methods: We analyzed 19,782 conversation utterances related to COVID-19 covering 5 countries (the United States, the United Kingdom, Canada, Malaysia, and the Philippines) between 2020 and 2021, from SimSimi, one of the world?s largest open-domain social chatbots. We identified chat topics using natural language processing methods and analyzed their emotional sentiments. Additionally, we compared the topic and sentiment variations in the COVID-19?related chats across countries. Results: Our analysis identified 18 emerging topics, which could be categorized into the following 5 overarching themes: ?Questions on COVID-19 asked to the chatbot? (30.6%), ?Preventive behaviors? (25.3%), ?Outbreak of COVID-19? (16.4%), ?Physical and psychological impact of COVID-19? (16.0%), and ?People and life in the pandemic? (11.7%). Our data indicated that people considered chatbots as a source of information about the pandemic, for example, by asking health-related questions. Users turned to SimSimi for conversation and emotional messages when offline social interactions became limited during the lockdown period. Users were more likely to express negative sentiments when conversing about topics related to masks, lockdowns, case counts, and their worries about the pandemic. In contrast, small talk with the chatbot was largely accompanied by positive sentiment. We also found cultural differences, with negative words being used more often by users in the United States than by those in Asia when talking about COVID-19. Conclusions: Based on the analysis of user-chatbot interactions on a live platform, this work provides insights into people?s informational and emotional needs during a global health crisis. Users sought health-related information and shared emotional messages with the chatbot, indicating the potential use of chatbots to provide accurate health information and emotional support. Future research can look into different support strategies that align with the direction of public health policy. UR - https://www.jmir.org/2023/1/e40922 UR - http://dx.doi.org/10.2196/40922 UR - http://www.ncbi.nlm.nih.gov/pubmed/36596214 ID - info:doi/10.2196/40922 ER - TY - JOUR AU - Sinha, Chaitali AU - Meheli, Saha AU - Kadaba, Madhura PY - 2023/1/26 TI - Understanding Digital Mental Health Needs and Usage With an Artificial Intelligence?Led Mental Health App (Wysa) During the COVID-19 Pandemic: Retrospective Analysis JO - JMIR Form Res SP - e41913 VL - 7 KW - digital mental health KW - COVID-19 KW - engagement KW - retention KW - perceived needs KW - pandemic waves KW - chatbot KW - conversational agent KW - mental health app KW - mobile health KW - digital health intervention N2 - Background: There has been a surge in mental health concerns during the COVID-19 pandemic, which has prompted the increased use of digital platforms. However, there is little known about the mental health needs and behaviors of the global population during the pandemic. This study aims to fill this knowledge gap through the analysis of real-world data collected from users of a digital mental health app (Wysa) regarding their engagement patterns and behaviors, as shown by their usage of the service. Objective: This study aims to (1) examine the relationship between mental health distress, digital health uptake, and COVID-19 case numbers; (2) evaluate engagement patterns with the app during the study period; and (3) examine the efficacy of the app in improving mental health outcomes for its users during the pandemic. Methods: This study used a retrospective observational design. During the COVID-19 pandemic, the app?s installations and emotional utterances were measured from March 2020 to October 2021 for the United Kingdom, the United States of America, and India and were mapped against COVID-19 case numbers and their peaks. The engagement of the users from this period (N=4541) with the Wysa app was compared to that of equivalent samples of users from a pre?COVID-19 period (1000 iterations). The efficacy was assessed for users who completed pre-post assessments for symptoms of depression (n=2061) and anxiety (n=1995) on the Patient Health Questionnaire-9 (PHQ-9) and Generalized Anxiety Disorder-7 (GAD-7) test measures, respectively. Results: Our findings demonstrate a significant positive correlation between the increase in the number of installs of the Wysa mental health app and the peaks of COVID-19 case numbers in the United Kingdom (P=.02) and India (P<.001). Findings indicate that users (N=4541) during the COVID period had a significantly higher engagement than the samples from the pre-COVID period, with a medium to large effect size for 80% of these 1000 iterative samples, as observed on the Mann-Whitney test. The PHQ-9 and GAD-7 pre-post assessments indicated statistically significant improvement with a medium effect size (PHQ-9: P=.57; GAD-7: P=.56). Conclusions: This study demonstrates that emotional distress increased substantially during the pandemic, prompting the increased uptake of an artificial intelligence?led mental health app (Wysa), and also offers evidence that the Wysa app could support its users and its usage could result in a significant reduction in symptoms of anxiety and depression. This study also highlights the importance of contextualizing interventions and suggests that digital health interventions can provide large populations with scalable and evidence-based support for mental health care. UR - https://formative.jmir.org/2023/1/e41913 UR - http://dx.doi.org/10.2196/41913 UR - http://www.ncbi.nlm.nih.gov/pubmed/36540052 ID - info:doi/10.2196/41913 ER - TY - JOUR AU - Perez-Ramos, G. Jose AU - Leon-Thomas, Mariela AU - Smith, L. Sabrina AU - Silverman, Laura AU - Perez-Torres, Claudia AU - Hall, C. Wyatte AU - Iadarola, Suzannah PY - 2023/1/25 TI - COVID-19 Vaccine Equity and Access: Case Study for Health Care Chatbots JO - JMIR Form Res SP - e39045 VL - 7 KW - mHealth KW - ICT KW - Information and Communication Technology KW - community KW - chatbot KW - COVID-19 KW - health equity KW - mobile health KW - health outcome KW - health disparity KW - minority population KW - health care gap KW - chatbot tool KW - user experience KW - chatbot development KW - health information N2 - Background: Disparities in COVID-19 information and vaccine access have emerged during the pandemic. Individuals from historically excluded communities (eg, Black and Latin American) experience disproportionately negative health outcomes related to COVID-19. Community gaps in COVID-19 education, social, and health care services (including vaccines) should be prioritized as a critical effort to end the pandemic. Misinformation created by the politicization of COVID-19 and related public health measures has magnified the pandemic?s challenges, including access to health care, vaccination and testing efforts, as well as personal protective equipment. Information and Communication Technology (ICT) has been demonstrated to reduce the gaps of marginalization in education and access among communities. Chatbots are an increasingly present example of ICTs, particularly in health care and in relation to the COVID-19 pandemic. Objective: This project aimed to (1) follow an inclusive and theoretically driven design process to develop and test a COVID-19 information ICT bilingual (English and Spanish) chatbot tool named ?Ana? and (2) characterize and evaluate user experiences of these innovative technologies. Methods: Ana was developed following a multitheoretical framework, and the project team was comprised of public health experts, behavioral scientists, community members, and medical team. A total of 7 iterations of ß chatbots were tested, and a total of 22 ß testers participated in this process. Content was curated primarily to provide users with factual answers to common questions about COVID-19. To ensure relevance of the content, topics were driven by community concerns and questions, as ascertained through research. Ana?s repository of educational content was based on national and international organizations as well as interdisciplinary experts. In the context of this development and pilot project, we identified an evaluation framework to explore reach, engagement, and satisfaction. Results: A total of 626 community members used Ana from August 2021 to March 2022. Among those participants, 346 used the English version, with an average of 43 users per month; and 280 participants used the Spanish version, with an average of 40 users monthly. Across all users, 63.87% (n=221) of English users and 22.14% (n=62) of Spanish users returned to use Ana at least once; 18.49% (n=64) among the English version users and 18.57% (n=52) among the Spanish version users reported their ranking. Positive ranking comprised the ?smiley? and ?loved? emojis, and negative ranking comprised the ?neutral,? ?sad,? and ?mad? emojis. When comparing negative and positive experiences, the latter was higher across Ana?s platforms (English: n=41, 64.06%; Spanish: n=41, 77.35%) versus the former (English: n=23, 35.93%; Spanish: n=12, 22.64%). Conclusions: This pilot project demonstrated the feasibility and capacity of an innovative ICT to share COVID-19 information within diverse communities. Creating a chatbot like Ana with bilingual content contributed to an equitable approach to address the lack of accessible COVID-19?related information. UR - https://formative.jmir.org/2023/1/e39045 UR - http://dx.doi.org/10.2196/39045 UR - http://www.ncbi.nlm.nih.gov/pubmed/36630649 ID - info:doi/10.2196/39045 ER - TY - JOUR AU - Wlasak, Wendy AU - Zwanenburg, Paul Sander AU - Paton, Chris PY - 2023/1/25 TI - Supporting Autonomous Motivation for Physical Activity With Chatbots During the COVID-19 Pandemic: Factorial Experiment JO - JMIR Form Res SP - e38500 VL - 7 KW - autonomous motivation KW - chatbots KW - self-determination theory KW - physical activity KW - factorial experiment KW - mobile phone KW - COVID-19 N2 - Background: Although physical activity can mitigate disease trajectories and improve and sustain mental health, many people have become less physically active during the COVID-19 pandemic. Personal information technology, such as activity trackers and chatbots, can technically converse with people and possibly enhance their autonomous motivation to engage in physical activity. The literature on behavior change techniques (BCTs) and self-determination theory (SDT) contains promising insights that can be leveraged in the design of these technologies; however, it remains unclear how this can be achieved. Objective: This study aimed to evaluate the feasibility of a chatbot system that improves the user?s autonomous motivation for walking based on BCTs and SDT. First, we aimed to develop and evaluate various versions of a chatbot system based on promising BCTs. Second, we aimed to evaluate whether the use of the system improves the autonomous motivation for walking and the associated factors of need satisfaction. Third, we explored the support for the theoretical mechanism and effectiveness of various BCT implementations. Methods: We developed a chatbot system using the mobile apps Telegram (Telegram Messenger Inc) and Google Fit (Google LLC). We implemented 12 versions of this system, which differed in 3 BCTs: goal setting, experimenting, and action planning. We then conducted a feasibility study with 102 participants who used this system over the course of 3 weeks, by conversing with a chatbot and completing questionnaires, capturing their perceived app support, need satisfaction, physical activity levels, and motivation. Results: The use of the chatbot systems was satisfactory, and on average, its users reported increases in autonomous motivation for walking. The dropout rate was low. Although approximately half of the participants indicated that they would have preferred to interact with a human instead of the chatbot, 46.1% (47/102) of the participants stated that the chatbot helped them become more active, and 42.2% (43/102) of the participants decided to continue using the chatbot for an additional week. Furthermore, the majority thought that a more advanced chatbot could be very helpful. The motivation was associated with the satisfaction of the needs of competence and autonomy, and need satisfaction, in turn, was associated with the perceived system support, providing support for SDT underpinnings. However, no substantial differences were found across different BCT implementations. Conclusions: The results provide evidence that chatbot systems are a feasible means to increase autonomous motivation for physical activity. We found support for SDT as a basis for the design, laying a foundation for larger studies to confirm the effectiveness of the selected BCTs within chatbot systems, explore a wider range of BCTs, and help the development of guidelines for the design of interactive technology that helps users achieve long-term health benefits. UR - https://formative.jmir.org/2023/1/e38500 UR - http://dx.doi.org/10.2196/38500 UR - http://www.ncbi.nlm.nih.gov/pubmed/36512402 ID - info:doi/10.2196/38500 ER - TY - JOUR AU - Ferré, Fabrice AU - Laurent, Rodolphe AU - Furelau, Philippine AU - Doumard, Emmanuel AU - Ferrier, Anne AU - Bosch, Laetitia AU - Ba, Cyndie AU - Menut, Rémi AU - Kurrek, Matt AU - Geeraerts, Thomas AU - Piau, Antoine AU - Minville, Vincent PY - 2023/1/16 TI - Perioperative Risk Assessment of Patients Using the MyRISK Digital Score Completed Before the Preanesthetic Consultation: Prospective Observational Study JO - JMIR Perioper Med SP - e39044 VL - 6 KW - chatbot KW - digital health KW - preanesthetic consultation KW - perioperative risk KW - machine learning KW - mobile phone N2 - Background: The ongoing COVID-19 pandemic has highlighted the potential of digital health solutions to adapt the organization of care in a crisis context. Objective: Our aim was to describe the relationship between the MyRISK score, derived from self-reported data collected by a chatbot before the preanesthetic consultation, and the occurrence of postoperative complications. Methods: This was a single-center prospective observational study that included 401 patients. The 16 items composing the MyRISK score were selected using the Delphi method. An algorithm was used to stratify patients with low (green), intermediate (orange), and high (red) risk. The primary end point concerned postoperative complications occurring in the first 6 months after surgery (composite criterion), collected by telephone and by consulting the electronic medical database. A logistic regression analysis was carried out to identify the explanatory variables associated with the complications. A machine learning model was trained to predict the MyRISK score using a larger data set of 1823 patients classified as green or red to reclassify individuals classified as orange as either modified green or modified red. User satisfaction and usability were assessed. Results: Of the 389 patients analyzed for the primary end point, 16 (4.1%) experienced a postoperative complication. A red score was independently associated with postoperative complications (odds ratio 5.9, 95% CI 1.5-22.3; P=.009). A modified red score was strongly correlated with postoperative complications (odds ratio 21.8, 95% CI 2.8-171.5; P=.003) and predicted postoperative complications with high sensitivity (94%) and high negative predictive value (99%) but with low specificity (49%) and very low positive predictive value (7%; area under the receiver operating characteristic curve=0.71). Patient satisfaction numeric rating scale and system usability scale median scores were 8.0 (IQR 7.0-9.0) out of 10 and 90.0 (IQR 82.5-95.0) out of 100, respectively. Conclusions: The MyRISK digital perioperative risk score established before the preanesthetic consultation was independently associated with the occurrence of postoperative complications. Its negative predictive strength was increased using a machine learning model to reclassify patients identified as being at intermediate risk. This reliable numerical categorization could be used to objectively refer patients with low risk to teleconsultation. UR - https://periop.jmir.org/2023/1/e39044 UR - http://dx.doi.org/10.2196/39044 UR - http://www.ncbi.nlm.nih.gov/pubmed/36645704 ID - info:doi/10.2196/39044 ER - TY - JOUR AU - Yoshii, Kenta AU - Kimura, Daiki AU - Kosugi, Akihiro AU - Shinkawa, Kaoru AU - Takase, Toshiro AU - Kobayashi, Masatomo AU - Yamada, Yasunori AU - Nemoto, Miyuki AU - Watanabe, Ryohei AU - Ota, Miho AU - Higashi, Shinji AU - Nemoto, Kiyotaka AU - Arai, Tetsuaki AU - Nishimura, Masafumi PY - 2023/1/13 TI - Screening of Mild Cognitive Impairment Through Conversations With Humanoid Robots: Exploratory Pilot Study JO - JMIR Form Res SP - e42792 VL - 7 KW - mild cognitive impairment KW - Alzheimer disease KW - neuropsychiatric symptoms KW - neuropsychological assessment KW - simple screening KW - humanoid robot KW - robot KW - symptoms KW - neuropsychological KW - monitoring N2 - Background: The rising number of patients with dementia has become a serious social problem worldwide. To help detect dementia at an early stage, many studies have been conducted to detect signs of cognitive decline by prosodic and acoustic features. However, many of these methods are not suitable for everyday use as they focus on cognitive function or conversational speech during the examinations. In contrast, conversational humanoid robots are expected to be used in the care of older people to help reduce the work of care and monitoring through interaction. Objective: This study focuses on early detection of mild cognitive impairment (MCI) through conversations between patients and humanoid robots without a specific examination, such as neuropsychological examination. Methods: This was an exploratory study involving patients with MCI and cognitively normal (CN) older people. We collected the conversation data during neuropsychological examination (Mini-Mental State Examination [MMSE]) and everyday conversation between a humanoid robot and 94 participants (n=47, 50%, patients with MCI and n=47, 50%, CN older people). We extracted 17 types of prosodic and acoustic features, such as the duration of response time and jitter, from these conversations. We conducted a statistical significance test for each feature to clarify the speech features that are useful when classifying people into CN people and patients with MCI. Furthermore, we conducted an automatic classification experiment using a support vector machine (SVM) to verify whether it is possible to automatically classify these 2 groups by the features identified in the statistical significance test. Results: We obtained significant differences in 5 (29%) of 17 types of features obtained from the MMSE conversational speech. The duration of response time, the duration of silent periods, and the proportion of silent periods showed a significant difference (P<.001) and met the reference value r=0.1 (small) of the effect size. Additionally, filler periods (P<.01) and the proportion of fillers (P=.02) showed a significant difference; however, these did not meet the reference value of the effect size. In contrast, we obtained significant differences in 16 (94%) of 17 types of features obtained from the everyday conversations with the humanoid robot. The duration of response time, the duration of speech periods, jitter (local, relative average perturbation [rap], 5-point period perturbation quotient [ppq5], difference of difference of periods [ddp]), shimmer (local, amplitude perturbation quotient [apq]3, apq5, apq11, average absolute differences between the amplitudes of consecutive periods [dda]), and F0cov (coefficient of variation of the fundamental frequency) showed a significant difference (P<.001). In addition, the duration of response time, the duration of silent periods, the filler period, and the proportion of fillers showed significant differences (P<.05). However, only jitter (local) met the reference value r=0.1 (small) of the effect size. In the automatic classification experiment for the classification of participants into CN and MCI groups, the results showed 66.0% accuracy in the MMSE conversational speech and 68.1% accuracy in everyday conversations with the humanoid robot. Conclusions: This study shows the possibility of early and simple screening for patients with MCI using prosodic and acoustic features from everyday conversations with a humanoid robot with the same level of accuracy as the MMSE. UR - https://formative.jmir.org/2023/1/e42792 UR - http://dx.doi.org/10.2196/42792 UR - http://www.ncbi.nlm.nih.gov/pubmed/36637896 ID - info:doi/10.2196/42792 ER - TY - JOUR AU - White, K. Becky AU - Martin, Annegret AU - White, Angus James PY - 2022/12/27 TI - User Experience of COVID-19 Chatbots: Scoping Review JO - J Med Internet Res SP - e35903 VL - 24 IS - 12 KW - COVID-19 KW - chatbot KW - engagement KW - user experience KW - pandemic KW - global health KW - digital health KW - health information N2 - Background: The COVID-19 pandemic has had global impacts and caused some health systems to experience substantial pressure. The need for accurate health information has been felt widely. Chatbots have great potential to reach people with authoritative information, and a number of chatbots have been quickly developed to disseminate information about COVID-19. However, little is known about user experiences of and perspectives on these tools. Objective: This study aimed to describe what is known about the user experience and user uptake of COVID-19 chatbots. Methods: A scoping review was carried out in June 2021 using keywords to cover the literature concerning chatbots, user engagement, and COVID-19. The search strategy included databases covering health, communication, marketing, and the COVID-19 pandemic specifically, including MEDLINE Ovid, Embase, CINAHL, ACM Digital Library, Emerald, and EBSCO. Studies that assessed the design, marketing, and user features of COVID-19 chatbots or those that explored user perspectives and experience were included. We excluded papers that were not related to COVID-19; did not include any reporting on user perspectives, experience, or the general use of chatbot features or marketing; or where a version was not available in English. The authors independently screened results for inclusion, using both backward and forward citation checking of the included papers. A thematic analysis was carried out with the included papers. Results: A total of 517 papers were sourced from the literature, and 10 were included in the final review. Our scoping review identified a number of factors impacting adoption and engagement including content, trust, digital ability, and acceptability. The papers included discussions about chatbots developed for COVID-19 screening and general COVID-19 information, as well as studies investigating user perceptions and opinions on COVID-19 chatbots. Conclusions: The COVID-19 pandemic presented a unique and specific challenge for digital health interventions. Design and implementation were required at a rapid speed as digital health service adoption accelerated globally. Chatbots for COVID-19 have been developed quickly as the pandemic has challenged health systems. There is a need for more comprehensive and routine reporting of factors impacting adoption and engagement. This paper has shown both the potential of chatbots to reach users in an emergency and the need to better understand how users engage and what they want. UR - https://www.jmir.org/2022/12/e35903 UR - http://dx.doi.org/10.2196/35903 UR - http://www.ncbi.nlm.nih.gov/pubmed/36520624 ID - info:doi/10.2196/35903 ER - TY - JOUR AU - Ntinga, Xolani AU - Musiello, Franco AU - Keter, Kipyegon Alfred AU - Barnabas, Ruanne AU - van Heerden, Alastair PY - 2022/12/12 TI - The Feasibility and Acceptability of an mHealth Conversational Agent Designed to Support HIV Self-testing in South Africa: Cross-sectional Study JO - J Med Internet Res SP - e39816 VL - 24 IS - 12 KW - HIV KW - HIV self-testing KW - HIVST KW - chatbot KW - conversational agents KW - mobile health KW - mHealth KW - mobile phone N2 - Background: HIV testing rates in sub-Saharan Africa remain below the targeted threshold, and primary care facilities struggle to provide adequate services. Innovative approaches that leverage digital technologies could improve HIV testing and access to treatment. Objective: This study aimed to examine the feasibility and acceptability of Nolwazi_bot. It is an isiZulu-speaking conversational agent designed to support HIV self-testing (HIVST) in KwaZulu-Natal, South Africa. Methods: Nolwazi_bot was designed with 4 different personalities that users could choose when selecting a counselor for their HIVST session. We recruited a convenience sample of 120 consenting adults and invited them to undertake an HIV self-test facilitated by the Nolwazi_bot. After testing, participants completed an interviewer-led posttest structured survey to assess their experience with the chatbot-supported HIVST. Results: Participants (N=120) ranged in age from 18 to 47 years, with half of them being men (61/120, 50.8%). Of the 120 participants, 111 (92.5%) had tested with a human counselor more than once. Of the 120 participants, 45 (37.5%) chose to be counseled by the female Nolwazi_bot personality aged between 18 and 25 years. Approximately one-fifth (21/120, 17.5%) of the participants who underwent an HIV self-test guided by the chatbot tested positive. Most participants (95/120, 79.2%) indicated that their HIV testing experience with a chatbot was much better than that with a human counselor. Many participants (93/120, 77.5%) reported that they felt as if they were talking to a real person, stating that the response tone and word choice of Nolwazi_bot reminded them of how they speak in daily conversations. Conclusions: The study provides insights into the potential of digital technology interventions to support HIVST in low-income and middle-income countries. Although we wait to see the full benefits of mobile health, technological interventions including conversational agents or chatbots provide us with an excellent opportunity to improve HIVST by addressing the barriers associated with clinic-based HIV testing. UR - https://www.jmir.org/2022/12/e39816 UR - http://dx.doi.org/10.2196/39816 UR - http://www.ncbi.nlm.nih.gov/pubmed/36508248 ID - info:doi/10.2196/39816 ER - TY - JOUR AU - Yu, Shubin AU - Zhao, Luming PY - 2022/12/8 TI - Designing Emotions for Health Care Chatbots: Text-Based or Icon-Based Approach JO - J Med Internet Res SP - e39573 VL - 24 IS - 12 KW - chatbot KW - health care KW - emotion KW - psychological distance KW - perception KW - human behavior KW - behavioral intention KW - predict KW - emotional intensity KW - text-based KW - icon-based KW - design UR - https://www.jmir.org/2022/12/e39573 UR - http://dx.doi.org/10.2196/39573 UR - http://www.ncbi.nlm.nih.gov/pubmed/36454078 ID - info:doi/10.2196/39573 ER - TY - JOUR AU - Seah, L. Cassandra E. AU - Zhang, Zheyuan AU - Sun, Sijin AU - Wiskerke, Esther AU - Daniels, Sarah AU - Porat, Talya AU - Calvo, A. Rafael PY - 2022/12/6 TI - Designing Mindfulness Conversational Agents for People With Early-Stage Dementia and Their Caregivers: Thematic Analysis of Expert and User Perspectives JO - JMIR Aging SP - e40360 VL - 5 IS - 4 KW - mindfulness KW - dyadic KW - dementia KW - caregivers KW - user needs KW - intervention KW - user KW - feedback KW - design KW - accessibility KW - relationships KW - mindset KW - essential N2 - Background: The number of people with dementia is expected to grow worldwide. Among the ways to support both persons with early-stage dementia and their caregivers (dyads), researchers are studying mindfulness interventions. However, few studies have explored technology-enhanced mindfulness interventions for dyads and the needs of persons with dementia and their caregivers. Objective: The main aim of this study was to elicit essential needs from people with dementia, their caregivers, dementia experts, and mindfulness experts to identify themes that can be used in the design of mindfulness conversational agents for dyads. Methods: Semistructured interviews were conducted with 5 dementia experts, 5 mindfulness experts, 5 people with early-stage dementia, and 5 dementia caregivers. Interviews were transcribed and coded on NVivo (QSR International) before themes were identified through a bottom-up inductive approach. Results: The results revealed that dyadic mindfulness is preferred and that implementation formats such as conversational agents have potential. A total of 5 common themes were also identified from expert and user feedback, which should be used to design mindfulness conversational agents for persons with dementia and their caregivers. The 5 themes included enhancing accessibility, cultivating positivity, providing simplified tangible and thought-based activities, encouraging a mindful mindset shift, and enhancing relationships. Conclusions: In essence, this research concluded with 5 themes that mindfulness conversational agents could be designed based on to meet the needs of persons with dementia and their caregivers. UR - https://aging.jmir.org/2022/4/e40360 UR - http://dx.doi.org/10.2196/40360 UR - http://www.ncbi.nlm.nih.gov/pubmed/36472897 ID - info:doi/10.2196/40360 ER - TY - JOUR AU - Rebelo, Nathanael AU - Sanders, Leslie AU - Li, Kay AU - Chow, L. James C. PY - 2022/12/2 TI - Learning the Treatment Process in Radiotherapy Using an Artificial Intelligence?Assisted Chatbot: Development Study JO - JMIR Form Res SP - e39443 VL - 6 IS - 12 KW - chatbot KW - artificial intelligence KW - machine learning KW - radiotherapy chain KW - radiation treatment process KW - communication KW - diagnosis KW - cancer therapy KW - internet of things KW - radiation oncology KW - medical physics KW - health care N2 - Background: In knowledge transfer for educational purposes, most cancer hospital or center websites have existing information on cancer health. However, such information is usually a list of topics that are neither interactive nor customized to offer any personal touches to people facing dire health crisis and to attempt to understand the concerns of the users. Patients with cancer, their families, and the general public accessing the information are often in challenging, stressful situations, wanting to access accurate information as efficiently as possible. In addition, there is seldom any comprehensive information specifically on radiotherapy, despite the large number of older patients with cancer, to go through the treatment process. Therefore, having someone with professional knowledge who can listen to them and provide the medical information with good will and encouragement would help patients and families struggling with critical illness, particularly during the lingering pandemic. Objective: This study created a novel virtual assistant, a chatbot that can explain the radiation treatment process to stakeholders comprehensively and accurately, in the absence of any similar software. This chatbot was created using the IBM Watson Assistant with artificial intelligence and machine learning features. The chatbot or bot was incorporated into a resource that can be easily accessed by the general public. Methods: The radiation treatment process in a cancer hospital or center was described by the radiotherapy process: patient diagnosis, consultation, and prescription; patient positioning, immobilization, and simulation; 3D-imaging for treatment planning; target and organ contouring; radiation treatment planning; patient setup and plan verification; and treatment delivery. The bot was created using IBM Watson (IBM Corp) assistant. The natural language processing feature in the Watson platform allowed the bot to flow through a given conversation structure and recognize how the user responds based on recognition of similar given examples, referred to as intents during development. Therefore, the bot can be trained using the responses received, by recognizing similar responses from the user and analyzing using Watson natural language processing. Results: The bot is hosted on a website by the Watson application programming interface. It is capable of guiding the user through the conversation structure and can respond to simple questions and provide resources for requests for information that was not directly programmed into the bot. The bot was tested by potential users, and the overall averages of the identified metrics are excellent. The bot can also acquire users? feedback for further improvements in the routine update. Conclusions: An artificial intelligence?assisted chatbot was created for knowledge transfer regarding radiation treatment process to the patients with cancer, their families, and the general public. The bot that is supported by machine learning was tested, and it was found that the bot can provide information about radiotherapy effectively. UR - https://formative.jmir.org/2022/12/e39443 UR - http://dx.doi.org/10.2196/39443 UR - http://www.ncbi.nlm.nih.gov/pubmed/36327383 ID - info:doi/10.2196/39443 ER - TY - JOUR AU - Nicol, Ginger AU - Wang, Ruoyun AU - Graham, Sharon AU - Dodd, Sherry AU - Garbutt, Jane PY - 2022/11/22 TI - Chatbot-Delivered Cognitive Behavioral Therapy in Adolescents With Depression and Anxiety During the COVID-19 Pandemic: Feasibility and Acceptability Study JO - JMIR Form Res SP - e40242 VL - 6 IS - 11 KW - COVID-19 KW - adolescent depression KW - mobile health KW - cognitive behavioral therapy KW - chatbot KW - relational conversational agent KW - depression KW - anxiety KW - suicide KW - self-harm KW - pandemic KW - pediatric KW - youth KW - adolescent KW - adolescence KW - psychiatry KW - conversational agent KW - CBT KW - clinic KW - data KW - acceptability KW - feasibility KW - usability KW - primary care KW - intervention KW - mental health KW - digital health KW - technology mediated KW - computer mediated N2 - Background: Symptoms of depression and anxiety, suicidal ideation, and self-harm have escalated among adolescents to crisis levels during the COVID-19 pandemic. As a result, primary care providers (PCPs) are often called on to provide first-line care for these youth. Digital health interventions can extend mental health specialty care, but few are evidence based. We evaluated the feasibility of delivering an evidence-based mobile health (mHealth) app with an embedded conversational agent to deliver cognitive behavioral therapy (CBT) to symptomatic adolescents presenting in primary care settings during the pandemic. Objective: In this 12-week pilot study, we evaluated the feasibility of delivering the app-based intervention to adolescents aged 13 to 17 years with moderate depressive symptoms who were treated in a practice-based research network (PBRN) of academically affiliated primary care clinics. We also obtained preliminary estimates of app acceptability, effectiveness, and usability. Methods: This small, pilot randomized controlled trial (RCT) evaluated depressive symptom severity in adolescents randomized to the app or to a wait list control condition. The primary end point was depression severity at 4-weeks, measured by the 9-item Patient Health Questionnaire (PHQ-9). Data on acceptability, feasibility, and usability were collected from adolescents and their parent or legal guardian. Qualitative interviews were conducted with 13 PCPs from 11 PBRN clinics to identify facilitators and barriers to incorporating mental health apps in treatment planning for adolescents with depression and anxiety. Results: The pilot randomized 18 participants to the app (n=10, 56%) or to a wait list control condition (n=8, 44%); 17 participants were included in the analysis, and 1 became ineligible upon chart review due to lack of eligibility based on documented diagnosis. The overall sample was predominantly female (15/17, 88%), White (15/17, 88%), and privately insured (15/17, 88%). Mean PHQ-9 scores at 4 weeks decreased by 3.3 points in the active treatment group (representing a shift in mean depression score from moderate to mild symptom severity categories) and 2 points in the wait list control group (no shift in symptom severity category). Teen- and parent-reported usability, feasibility, and acceptability of the app was high. PCPs reported preference for introducing mHealth interventions like the one in this study early in the course of care for individuals presenting with mild or moderate symptoms. Conclusions: In this small study, we demonstrated the feasibility, acceptability, usability, and safety of using a CBT-based chatbot for adolescents presenting with moderate depressive symptoms in a network of PBRN-based primary care clinics. This pilot study could not establish effectiveness, but our results suggest that further study in a larger pediatric population is warranted. Future study inclusive of rural, socioeconomically disadvantaged, and underrepresented communities is needed to establish generalizability of effectiveness and identify implementation-related adaptations needed to promote broader uptake in pediatric primary care. Trial Registration: ClinicalTrials.gov NCT04603053; https://clinicaltrials.gov/ct2/show/NCT04603053 UR - https://formative.jmir.org/2022/11/e40242 UR - http://dx.doi.org/10.2196/40242 UR - http://www.ncbi.nlm.nih.gov/pubmed/36413390 ID - info:doi/10.2196/40242 ER - TY - JOUR AU - He, Yuhao AU - Yang, Li AU - Zhu, Xiaokun AU - Wu, Bin AU - Zhang, Shuo AU - Qian, Chunlian AU - Tian, Tian PY - 2022/11/21 TI - Mental Health Chatbot for Young Adults With Depressive Symptoms During the COVID-19 Pandemic: Single-Blind, Three-Arm Randomized Controlled Trial JO - J Med Internet Res SP - e40719 VL - 24 IS - 11 KW - chatbot KW - conversational agent KW - depression KW - mental health KW - mHealth KW - digital medicine KW - randomized controlled trial KW - evaluation KW - cognitive behavioral therapy KW - young adult KW - youth KW - health service KW - mobile health KW - COVID-19 N2 - Background: Depression has a high prevalence among young adults, especially during the COVID-19 pandemic. However, mental health services remain scarce and underutilized worldwide. Mental health chatbots are a novel digital technology to provide fully automated interventions for depressive symptoms. Objective: The purpose of this study was to test the clinical effectiveness and nonclinical performance of a cognitive behavioral therapy (CBT)?based mental health chatbot (XiaoE) for young adults with depressive symptoms during the COVID-19 pandemic. Methods: In a single-blind, 3-arm randomized controlled trial, participants manifesting depressive symptoms recruited from a Chinese university were randomly assigned to a mental health chatbot (XiaoE; n=49), an e-book (n=49), or a general chatbot (Xiaoai; n=50) group in a ratio of 1:1:1. Participants received a 1-week intervention. The primary outcome was the reduction of depressive symptoms according to the 9-item Patient Health Questionnaire (PHQ-9) at 1 week later (T1) and 1 month later (T2). Both intention-to-treat and per-protocol analyses were conducted under analysis of covariance models adjusting for baseline data. Controlled multiple imputation and ?-based sensitivity analysis were performed for missing data. The secondary outcomes were the level of working alliance measured using the Working Alliance Questionnaire (WAQ), usability measured using the Usability Metric for User Experience-LITE (UMUX-LITE), and acceptability measured using the Acceptability Scale (AS). Results: Participants were on average 18.78 years old, and 37.2% (55/148) were female. The mean baseline PHQ-9 score was 10.02 (SD 3.18; range 2-19). Intention-to-treat analysis revealed lower PHQ-9 scores among participants in the XiaoE group compared with participants in the e-book group and Xiaoai group at both T1 (F2,136=17.011; P<.001; d=0.51) and T2 (F2,136=5.477; P=.005; d=0.31). Better working alliance (WAQ; F2,145=3.407; P=.04) and acceptability (AS; F2,145=4.322; P=.02) were discovered with XiaoE, while no significant difference among arms was found for usability (UMUX-LITE; F2,145=0.968; P=.38). Conclusions: A CBT-based chatbot is a feasible and engaging digital therapeutic approach that allows easy accessibility and self-guided mental health assistance for young adults with depressive symptoms. A systematic evaluation of nonclinical metrics for a mental health chatbot has been established in this study. In the future, focus on both clinical outcomes and nonclinical metrics is necessary to explore the mechanism by which mental health chatbots work on patients. Further evidence is required to confirm the long-term effectiveness of the mental health chatbot via trails replicated with a longer dose, as well as exploration of its stronger efficacy in comparison with other active controls. Trial Registration: Chinese Clinical Trial Registry ChiCTR2100052532; http://www.chictr.org.cn/showproj.aspx?proj=135744 UR - https://www.jmir.org/2022/11/e40719 UR - http://dx.doi.org/10.2196/40719 UR - http://www.ncbi.nlm.nih.gov/pubmed/36355633 ID - info:doi/10.2196/40719 ER - TY - JOUR AU - Kocaballi, Baki Ahmet AU - Sezgin, Emre AU - Clark, Leigh AU - Carroll, M. John AU - Huang, Yungui AU - Huh-Yoo, Jina AU - Kim, Junhan AU - Kocielnik, Rafal AU - Lee, Yi-Chieh AU - Mamykina, Lena AU - Mitchell, G. Elliot AU - Moore, J. Robert AU - Murali, Prasanth AU - Mynatt, D. Elizabeth AU - Park, Young Sun AU - Pasta, Alessandro AU - Richards, Deborah AU - Silva, M. Lucas AU - Smriti, Diva AU - Spillane, Brendan AU - Zhang, Zhan AU - Zubatiy, Tamara PY - 2022/11/15 TI - Design and Evaluation Challenges of Conversational Agents in Health Care and Well-being: Selective Review Study JO - J Med Internet Res SP - e38525 VL - 24 IS - 11 KW - conversational interfaces KW - conversational agents KW - dialog systems KW - health care KW - well-being N2 - Background: Health care and well-being are 2 main interconnected application areas of conversational agents (CAs). There is a significant increase in research, development, and commercial implementations in this area. In parallel to the increasing interest, new challenges in designing and evaluating CAs have emerged. Objective: This study aims to identify key design, development, and evaluation challenges of CAs in health care and well-being research. The focus is on the very recent projects with their emerging challenges. Methods: A review study was conducted with 17 invited studies, most of which were presented at the ACM (Association for Computing Machinery) CHI 2020 conference workshop on CAs for health and well-being. Eligibility criteria required the studies to involve a CA applied to a health or well-being project (ongoing or recently finished). The participating studies were asked to report on their projects? design and evaluation challenges. We used thematic analysis to review the studies. Results: The findings include a range of topics from primary care to caring for older adults to health coaching. We identified 4 major themes: (1) Domain Information and Integration, (2) User-System Interaction and Partnership, (3) Evaluation, and (4) Conversational Competence. Conclusions: CAs proved their worth during the pandemic as health screening tools, and are expected to stay to further support various health care domains, especially personal health care. Growth in investment in CAs also shows the value as a personal assistant. Our study shows that while some challenges are shared with other CA application areas, safety and privacy remain the major challenges in the health care and well-being domains. An increased level of collaboration across different institutions and entities may be a promising direction to address some of the major challenges that otherwise would be too complex to be addressed by the projects with their limited scope and budget. UR - https://www.jmir.org/2022/11/e38525 UR - http://dx.doi.org/10.2196/38525 UR - http://www.ncbi.nlm.nih.gov/pubmed/36378515 ID - info:doi/10.2196/38525 ER - TY - JOUR AU - Li, Xingyi AU - Xie, Shirong AU - Ye, Zhengqiang AU - Ma, Shishi AU - Yu, Guangjun PY - 2022/11/7 TI - Investigating Patients' Continuance Intention Toward Conversational Agents in Outpatient Departments: Cross-sectional Field Survey JO - J Med Internet Res SP - e40681 VL - 24 IS - 11 KW - conversational agent KW - continuance intention KW - expectation-confirmation model KW - partial least squares KW - structural equation modeling KW - chatbot KW - virtual assistant KW - cross-sectional KW - field study KW - optimization KW - outpatient KW - interview KW - qualitative KW - questionnaire KW - satisfaction KW - perceived usefulness KW - intention KW - adoption KW - attitude KW - perception N2 - Background: Conversational agents (CAs) have been developed in outpatient departments to improve physician-patient communication efficiency. As end users, patients? continuance intention is essential for the sustainable development of CAs. Objective: The aim of this study was to facilitate the successful usage of CAs by identifying key factors influencing patients? continuance intention and proposing corresponding managerial implications. Methods: This study proposed an extended expectation-confirmation model and empirically tested the model via a cross-sectional field survey. The questionnaire included demographic characteristics, multiple-item scales, and an optional open-ended question on patients? specific expectations for CAs. Partial least squares structural equation modeling was applied to assess the model and hypotheses. The qualitative data were analyzed via thematic analysis. Results: A total of 172 completed questionaries were received, with a 100% (172/172) response rate. The proposed model explained 75.5% of the variance in continuance intention. Both satisfaction (?=.68; P<.001) and perceived usefulness (?=.221; P=.004) were significant predictors of continuance intention. Patients' extent of confirmation significantly and positively affected both perceived usefulness (?=.817; P<.001) and satisfaction (?=.61; P<.001). Contrary to expectations, perceived ease of use had no significant impact on perceived usefulness (?=.048; P=.37), satisfaction (?=?.004; P=.63), and continuance intention (?=.026; P=.91). The following three themes were extracted from the 74 answers to the open-ended question: personalized interaction, effective utilization, and clear illustrations. Conclusions: This study identified key factors influencing patients? continuance intention toward CAs. Satisfaction and perceived usefulness were significant predictors of continuance intention (P<.001 and P<.004, respectively) and were significantly affected by patients? extent of confirmation (P<.001 and P<.001, respectively). Developing a better understanding of patients? continuance intention can help administrators figure out how to facilitate the effective implementation of CAs. Efforts should be made toward improving the aspects that patients reasonably expect CAs to have, which include personalized interactions, effective utilization, and clear illustrations. UR - https://www.jmir.org/2022/11/e40681 UR - http://dx.doi.org/10.2196/40681 UR - http://www.ncbi.nlm.nih.gov/pubmed/36342768 ID - info:doi/10.2196/40681 ER - TY - JOUR AU - Ludin, Nicola AU - Holt-Quick, Chester AU - Hopkins, Sarah AU - Stasiak, Karolina AU - Hetrick, Sarah AU - Warren, Jim AU - Cargo, Tania PY - 2022/11/4 TI - A Chatbot to Support Young People During the COVID-19 Pandemic in New Zealand: Evaluation of the Real-World Rollout of an Open Trial JO - J Med Internet Res SP - e38743 VL - 24 IS - 11 KW - COVID-19 KW - youth KW - chatbots KW - adolescent mental health KW - dialog-based intervention KW - digital mental health N2 - Background: The number of young people in New Zealand (Aotearoa) who experience mental health challenges is increasing. As those in Aotearoa went into the initial COVID-19 lockdown, an ongoing digital mental health project was adapted and underwent rapid content authoring to create the Aroha chatbot. This dynamic digital support was designed with and for young people to help manage pandemic-related worry. Objective: Aroha was developed to provide practical evidence-based tools for anxiety management using cognitive behavioral therapy and positive psychology. The chatbot included practical ideas to maintain social and cultural connection, and to stay active and well. Methods: Stay-at-home orders under Aotearoa?s lockdown commenced on March 20, 2020. By leveraging previously developed chatbot technology and broader existing online trial infrastructure, the Aroha chatbot was launched promptly on April 7, 2020. Dissemination of the chatbot for an open trial was via a URL, and feedback on the experience of the lockdown and the experience of Aroha was gathered via online questionnaires and a focus group, and from community members. Results: In the 2 weeks following the launch of the chatbot, there were 393 registrations, and 238 users logged into the chatbot, of whom 127 were in the target age range (13-24 years). Feedback guided iterative and responsive content authoring to suit the dynamic situation and motivated engineering to dynamically detect and react to a range of conversational intents. Conclusions: The experience of the implementation of the Aroha chatbot highlights the feasibility of providing timely event-specific digital mental health support and the technology requirements for a flexible and enabling chatbot architectural framework. UR - https://www.jmir.org/2022/11/e38743 UR - http://dx.doi.org/10.2196/38743 UR - http://www.ncbi.nlm.nih.gov/pubmed/36219754 ID - info:doi/10.2196/38743 ER - TY - JOUR AU - Schick, Anita AU - Feine, Jasper AU - Morana, Stefan AU - Maedche, Alexander AU - Reininghaus, Ulrich PY - 2022/10/31 TI - Validity of Chatbot Use for Mental Health Assessment: Experimental Study JO - JMIR Mhealth Uhealth SP - e28082 VL - 10 IS - 10 KW - chatbot KW - distress KW - monitoring KW - mobile health KW - social desirability KW - social presence N2 - Background: Mental disorders in adolescence and young adulthood are major public health concerns. Digital tools such as text-based conversational agents (ie, chatbots) are a promising technology for facilitating mental health assessment. However, the human-like interaction style of chatbots may induce potential biases, such as socially desirable responding (SDR), and may require further effort to complete assessments. Objective: This study aimed to investigate the convergent and discriminant validity of chatbots for mental health assessments, the effect of assessment mode on SDR, and the effort required by participants for assessments using chatbots compared with established modes. Methods: In a counterbalanced within-subject design, we assessed 2 different constructs?psychological distress (Kessler Psychological Distress Scale and Brief Symptom Inventory-18) and problematic alcohol use (Alcohol Use Disorders Identification Test-3)?in 3 modes (chatbot, paper-and-pencil, and web-based), and examined convergent and discriminant validity. In addition, we investigated the effect of mode on SDR, controlling for perceived sensitivity of items and individuals? tendency to respond in a socially desirable way, and we also assessed the perceived social presence of modes. Including a between-subject condition, we further investigated whether SDR is increased in chatbot assessments when applied in a self-report setting versus when human interaction may be expected. Finally, the effort (ie, complexity, difficulty, burden, and time) required to complete the assessments was investigated. Results: A total of 146 young adults (mean age 24, SD 6.42 years; n=67, 45.9% female) were recruited from a research panel for laboratory experiments. The results revealed high positive correlations (all P<.001) of measures of the same construct across different modes, indicating the convergent validity of chatbot assessments. Furthermore, there were no correlations between the distinct constructs, indicating discriminant validity. Moreover, there were no differences in SDR between modes and whether human interaction was expected, although the perceived social presence of the chatbot mode was higher than that of the established modes (P<.001). Finally, greater effort (all P<.05) and more time were needed to complete chatbot assessments than for completing the established modes (P<.001). Conclusions: Our findings suggest that chatbots may yield valid results. Furthermore, an understanding of chatbot design trade-offs in terms of potential strengths (ie, increased social presence) and limitations (ie, increased effort) when assessing mental health were established. UR - https://mhealth.jmir.org/2022/10/e28082 UR - http://dx.doi.org/10.2196/28082 UR - http://www.ncbi.nlm.nih.gov/pubmed/36315228 ID - info:doi/10.2196/28082 ER - TY - JOUR AU - Okonkwo, Wilfred Chinedu AU - Amusa, Babatunde Lateef AU - Twinomurinzi, Hossana PY - 2022/10/27 TI - COVID-Bot, an Intelligent System for COVID-19 Vaccination Screening: Design and Development JO - JMIR Form Res SP - e39157 VL - 6 IS - 10 KW - chatbot KW - COVID-Bot KW - COVID-19 KW - students KW - vaccine KW - exemption letter KW - vaccination KW - artificial intelligence N2 - Background: Coronavirus continues to spread worldwide, causing various health and economic disruptions. One of the most important approaches to controlling the spread of this disease is to use an artificial intelligence (AI)?based technological intervention, such as a chatbot system. Chatbots can aid in the fight against the spread of COVID-19. Objective: This paper introduces COVID-Bot, an intelligent interactive system that can help screen students and confirm their COVID-19 vaccination status. Methods: The design and development of COVID-Bot followed the principles of the design science research (DSR) process, which is a research method for creating a new scientific artifact. COVID-Bot was developed and implemented using the SnatchBot chatbot application programming interface (API) and its predefined tools, which are driven by various natural language processing algorithms. Results: An evaluation was carried out through a survey that involved 106 university students in determining the functionality, compatibility, reliability, and usability of COVID-Bot. The findings indicated that 92 (86.8%) of the participants agreed that the chatbot functions well, 85 (80.2%) agreed that it fits well with their mobile devices and their lifestyle, 86 (81.1%) agreed that it has the potential to produce accurate and consistent responses, and 85 (80.2%) agreed that it is easy to use. The average obtained ? was .87, indicating satisfactory reliability. Conclusions: This study demonstrates that incorporating chatbot technology into the educational system can combat the spread of COVID-19 among university students. The intelligent system does this by interacting with students to determine their vaccination status. UR - https://formative.jmir.org/2022/10/e39157 UR - http://dx.doi.org/10.2196/39157 UR - http://www.ncbi.nlm.nih.gov/pubmed/36301616 ID - info:doi/10.2196/39157 ER - TY - JOUR AU - Pithpornchaiyakul, Samerchit AU - Naorungroj, Supawadee AU - Pupong, Kittiwara AU - Hunsrisakhun, Jaranya PY - 2022/10/21 TI - Using a Chatbot as an Alternative Approach for In-Person Toothbrushing Training During the COVID-19 Pandemic: Comparative Study JO - J Med Internet Res SP - e39218 VL - 24 IS - 10 KW - mHealth KW - tele-dentistry KW - digital health KW - chatbot KW - conversional agents KW - oral hygiene KW - oral health behaviors KW - protection motivation theory KW - young children KW - caregiver KW - in-person toothbrushing training KW - COVID-19 N2 - Background: It is recommended that caregivers receive oral health education and in-person training to improve toothbrushing for young children. To strengthen oral health education before COVID-19, the 21-Day FunDee chatbot with in-person toothbrushing training for caregivers was used. During the pandemic, practical experience was difficult to implement. Therefore, the 30-Day FunDee chatbot was created to extend the coverage of chatbots from 21 days to 30 days by incorporating more videos on toothbrushing demonstrations and dialogue. This was a secondary data comparison of 2 chatbots in similar rural areas of Pattani province: Maikan district (Study I) and Maelan district (Study II). Objective: This study aimed to evaluate the effectiveness and usability of 2 chatbots, 21-Day FunDee (Study I) and 30-Day FunDee (Study II), based on the protection motivation theory (PMT). This study explored the feasibility of using the 30-Day FunDee chatbot to increase toothbrushing behaviors for caregivers in oral hygiene care for children aged 6 months to 36 months without in-person training during the COVID-19 pandemic. Methods: A pre-post design was used in both studies. The effectiveness was evaluated among caregivers in terms of oral hygiene practices, knowledge, and oral health care perceptions based on PMT. In Study I, participants received in-person training and a 21-day chatbot course during October 2018 to February 2019. In Study II, participants received only daily chatbot programming for 30 days during December 2021 to February 2022. Data were gathered at baseline of each study and at 30 days and 60 days after the start of Study I and Study II, respectively. After completing their interventions, the chatbot's usability was assessed using open-ended questions. Study I evaluated the plaque score, whereas Study II included an in-depth interview. The 2 studies were compared to determine the feasibility of using the 30-Day FunDee chatbot as an alternative to in-person training. Results: There were 71 pairs of participants: 37 in Study I and 34 in Study II. Both chatbots significantly improved overall knowledge (Study I: P<.001; Study II: P=.001), overall oral health care perceptions based on PMT (Study I: P<.001; Study II: P<.001), and toothbrushing for children by caregivers (Study I: P=.02; Study II: P=.04). Only Study I had statistically significant differences in toothbrushing at least twice a day (P=.002) and perceived vulnerability (P=.003). The highest overall chatbot satisfaction was 9.2 (SD 0.9) in Study I and 8.6 (SD 1.2) in Study II. In Study I, plaque levels differed significantly (P<.001). Conclusions: This was the first study using a chatbot in oral health education. We established the effectiveness and usability of 2 chatbot programs for promoting oral hygiene care of young children by caregivers. The 30-Day FunDee chatbot showed the possibility of improving toothbrushing skills without requiring in-person training. Trial Registration: Thai Clinical Trials Registry TCTR20191223005; http://www.thaiclinicaltrials.org/show/TCTR20191223005 and TCTR20210927004; https://www.thaiclinicaltrials.org/show/TCTR20210927004 UR - https://www.jmir.org/2022/10/e39218 UR - http://dx.doi.org/10.2196/39218 UR - http://www.ncbi.nlm.nih.gov/pubmed/36179147 ID - info:doi/10.2196/39218 ER - TY - JOUR AU - Moya-Galé, Gemma AU - Walsh, J. Stephen AU - Goudarzi, Alireza PY - 2022/10/20 TI - Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study JO - J Med Internet Res SP - e40567 VL - 24 IS - 10 KW - automatic speech recognition KW - Parkinson disease KW - intelligibility KW - dysarthria KW - digital health KW - artificial intelligence N2 - Background: Most individuals with Parkinson disease (PD) experience a degradation in their speech intelligibility. Research on the use of automatic speech recognition (ASR) to assess intelligibility is still sparse, especially when trying to replicate communication challenges in real-life conditions (ie, noisy backgrounds). Developing technologies to automatically measure intelligibility in noise can ultimately assist patients in self-managing their voice changes due to the disease. Objective: The goal of this study was to pilot-test and validate the use of a customized web-based app to assess speech intelligibility in noise in individuals with dysarthria associated with PD. Methods: In total, 20 individuals with dysarthria associated with PD and 20 healthy controls (HCs) recorded a set of sentences using their phones. The Google Cloud ASR API was used to automatically transcribe the speakers? sentences. An algorithm was created to embed speakers? sentences in +6-dB signal-to-noise multitalker babble. Results from ASR performance were compared to those from 30 listeners who orthographically transcribed the same set of sentences. Data were reduced into a single event, defined as a success if the artificial intelligence (AI) system transcribed a random speaker or sentence as well or better than the average of 3 randomly chosen human listeners. These data were further analyzed by logistic regression to assess whether AI success differed by speaker group (HCs or speakers with dysarthria) or was affected by sentence length. A discriminant analysis was conducted on the human listener data and AI transcriber data independently to compare the ability of each data set to discriminate between HCs and speakers with dysarthria. Results: The data analysis indicated a 0.8 probability (95% CI 0.65-0.91) that AI performance would be as good or better than the average human listener. AI transcriber success probability was not found to be dependent on speaker group. AI transcriber success was found to decrease with sentence length, losing an estimated 0.03 probability of transcribing as well as the average human listener for each word increase in sentence length. The AI transcriber data were found to offer the same discrimination of speakers into categories (HCs and speakers with dysarthria) as the human listener data. Conclusions: ASR has the potential to assess intelligibility in noise in speakers with dysarthria associated with PD. Our results hold promise for the use of AI with this clinical population, although a full range of speech severity needs to be evaluated in future work, as well as the effect of different speaking tasks on ASR. UR - https://www.jmir.org/2022/10/e40567 UR - http://dx.doi.org/10.2196/40567 UR - http://www.ncbi.nlm.nih.gov/pubmed/36264608 ID - info:doi/10.2196/40567 ER - TY - JOUR AU - Goonesekera, Yenushka AU - Donkin, Liesje PY - 2022/10/20 TI - A Cognitive Behavioral Therapy Chatbot (Otis) for Health Anxiety Management: Mixed Methods Pilot Study JO - JMIR Form Res SP - e37877 VL - 6 IS - 10 KW - health anxiety KW - conversational agent KW - illness anxiety disorder KW - COVID-19 KW - iCBT KW - user experience KW - anthropomorphism N2 - Background: An increase in health anxiety was observed during the COVID-19 pandemic. However, due to physical distancing restrictions and a strained mental health system, people were unable to access support to manage health anxiety. Chatbots are emerging as an interactive means to deliver psychological interventions in a scalable manner and provide an opportunity for novel therapy delivery to large groups of people including those who might struggle to access traditional therapies. Objective: The aim of this mixed methods pilot study was to investigate the feasibility, acceptability, engagement, and effectiveness of a cognitive behavioral therapy (CBT)?based chatbot (Otis) as an early health anxiety management intervention for adults in New Zealand during the COVID-19 pandemic. Methods: Users were asked to complete a 14-day program run by Otis, a primarily decision tree?based chatbot on Facebook Messenger. Health anxiety, general anxiety, intolerance of uncertainty, personal well-being, and quality of life were measured pre-intervention, postintervention, and at a 12-week follow-up. Paired samples t tests and 1-way ANOVAs were conducted to investigate the associated changes in the outcomes over time. Semistructured interviews and written responses in the self-report questionnaires and Facebook Messenger were thematically analyzed. Results: The trial was completed by 29 participants who provided outcome measures at both postintervention and follow-up. Although an average decrease in health anxiety did not reach significance at postintervention (P=.55) or follow-up (P=.08), qualitative analysis demonstrated that participants perceived benefiting from the intervention. Significant improvement in general anxiety, personal well-being, and quality of life was associated with the use of Otis at postintervention and follow-up. Anthropomorphism, Otis? appearance, and delivery of content facilitated the use of Otis. Technical difficulties and high performance and effort expectancy were, in contrast, barriers to acceptance and engagement of Otis. Conclusions: Otis may be a feasible, acceptable, and engaging means of delivering CBT to improve anxiety management, quality of life, and personal well-being but might not significantly reduce health anxiety. UR - https://formative.jmir.org/2022/10/e37877 UR - http://dx.doi.org/10.2196/37877 UR - http://www.ncbi.nlm.nih.gov/pubmed/36150049 ID - info:doi/10.2196/37877 ER - TY - JOUR AU - Daniel, Thomas AU - de Chevigny, Alix AU - Champrigaud, Adeline AU - Valette, Julie AU - Sitbon, Marine AU - Jardin, Meryam AU - Chevalier, Delphine AU - Renet, Sophie PY - 2022/10/11 TI - Answering Hospital Caregivers? Questions at Any Time: Proof-of-Concept Study of an Artificial Intelligence?Based Chatbot in a French Hospital JO - JMIR Hum Factors SP - e39102 VL - 9 IS - 4 KW - chatbot KW - artificial intelligence KW - pharmacy KW - hospital KW - health care KW - drugs KW - medication KW - information quality KW - health information KW - caregiver KW - healthcare staff KW - digital health tool KW - COVID-19 KW - information technology N2 - Background: Access to accurate information in health care is a key point for caregivers to avoid medication errors, especially with the reorganization of staff and drug circuits during health crises such as the COVID?19 pandemic. It is, therefore, the role of the hospital pharmacy to answer caregivers? questions. Some may require the expertise of a pharmacist, some should be answered by pharmacy technicians, but others are simple and redundant, and automated responses may be provided. Objective: We aimed at developing and implementing a chatbot to answer questions from hospital caregivers about drugs and pharmacy organization 24 hours a day and to evaluate this tool. Methods: The ADDIE (Analysis, Design, Development, Implementation, and Evaluation) model was used by a multiprofessional team composed of 3 hospital pharmacists, 2 members of the Innovation and Transformation Department, and the IT service provider. Based on an analysis of the caregivers? needs about drugs and pharmacy organization, we designed and developed a chatbot. The tool was then evaluated before its implementation into the hospital intranet. Its relevance and conversations with testers were monitored via the IT provider?s back office. Results: Needs analysis with 5 hospital pharmacists and 33 caregivers from 5 health services allowed us to identify 7 themes about drugs and pharmacy organization (such as opening hours and specific prescriptions). After a year of chatbot design and development, the test version obtained good evaluation scores: its speed was rated 8.2 out of 10, usability 8.1 out of 10, and appearance 7.5 out of 10. Testers were generally satisfied (70%) and were hoping for the content to be enhanced. Conclusions: The chatbot seems to be a relevant tool for hospital caregivers, helping them obtain reliable and verified information they need on drugs and pharmacy organization. In the context of significant mobility of nursing staff during the health crisis due to the COVID-19 pandemic, the chatbot could be a suitable tool for transmitting relevant information related to drug circuits or specific procedures. To our knowledge, this is the first time that such a tool has been designed for caregivers. Its development further continued by means of tests conducted with other users such as pharmacy technicians and via the integration of additional data before the implementation on the 2 hospital sites. UR - https://humanfactors.jmir.org/2022/4/e39102 UR - http://dx.doi.org/10.2196/39102 UR - http://www.ncbi.nlm.nih.gov/pubmed/35930555 ID - info:doi/10.2196/39102 ER - TY - JOUR AU - Sasseville, Maxime AU - Barony Sanchez, H. Romina AU - Yameogo, R. Achille AU - Bergeron-Drolet, Laurie-Ann AU - Bergeron, Frédéric AU - Gagnon, Marie-Pierre PY - 2022/10/11 TI - Interactive Conversational Agents for Health Promotion, Prevention, and Care: Protocol for a Mixed Methods Systematic Scoping Review JO - JMIR Res Protoc SP - e40265 VL - 11 IS - 10 KW - conversational agents KW - chatbots KW - scoping review KW - literature review KW - healthcare KW - health care KW - health promotion KW - prevention KW - care KW - computer KW - natural language processing KW - literature KW - community N2 - Background: Interactive conversational agents, also known as ?chatbots,? are computer programs that use natural language processing to engage in conversations with humans to provide or collect information. Although the literature on the development and use of chatbots for health interventions is growing, important knowledge gaps remain, such as identifying design aspects relevant to health care and functions to offer transparency in decision-making automation. Objective: This paper presents the protocol for a scoping review that aims to identify and categorize the interactive conversational agents currently used in health care. Methods: A mixed methods systematic scoping review will be conducted according to the Arksey and O?Malley framework and the guidance of Peters et al for systematic scoping reviews. A specific search strategy will be formulated for 5 of the most relevant databases to identify studies published in the last 20 years. Two reviewers will independently apply the inclusion criteria using the full texts and extract data. We will use structured narrative summaries of main themes to present a portrait of the current scope of available interactive conversational agents targeting health promotion, prevention, and care. We will also summarize the differences and similarities between these conversational agents. Results: The search strategy and screening steps were completed in March 2022. Data extraction and analysis started in May 2022, and the results are expected to be published in October 2022. Conclusions: This fundamental knowledge will be useful for the development of interactive conversational agents adapted to specific groups in vulnerable situations in health care and community settings. International Registered Report Identifier (IRRID): DERR1-10.2196/40265 UR - https://www.researchprotocols.org/2022/10/e40265 UR - http://dx.doi.org/10.2196/40265 UR - http://www.ncbi.nlm.nih.gov/pubmed/36222804 ID - info:doi/10.2196/40265 ER - TY - JOUR AU - Smriti, Diva AU - Kao, Annie Tsui-Sui AU - Rathod, Rahil AU - Shin, Youn Ji AU - Peng, Wei AU - Williams, Jake AU - Mujib, Ishad Munif AU - Colosimo, Meghan AU - Huh-Yoo, Jina PY - 2022/10/7 TI - Motivational Interviewing Conversational Agent for Parents as Proxies for Their Children in Healthy Eating: Development and User Testing JO - JMIR Hum Factors SP - e38908 VL - 9 IS - 4 KW - conversational agents KW - voice user interface KW - voice agents KW - proxy KW - motivational interviewing KW - parents KW - healthy eating N2 - Background: Increased adoption of off-the-shelf conversational agents (CAs) brings opportunities to integrate therapeutic interventions. Motivational Interviewing (MI) can then be integrated with CAs for cost-effective access to it. MI can be especially beneficial for parents who often have low motivation because of limited time and resources to eat healthy together with their children. Objective: We developed a Motivational Interviewing Conversational Agent (MICA) to improve healthy eating in parents who serve as a proxy for health behavior change in their children. Proxy relationships involve a person serving as a catalyst for behavior change in another person. Parents, serving as proxies, can bring about behavior change in their children. Methods: We conducted user test sessions of the MICA prototype to understand the perceived acceptability and usefulness of the MICA prototype by parents. A total of 24 parents of young children participated in 2 user test sessions with MICA, approximately 2 weeks apart. After parents? interaction with the MICA prototype in each user test session, we used qualitative interviews to understand parents? perceptions and suggestions for improvements in MICA. Results: Findings showed participants? perceived usefulness of MICAs for helping them self-reflect and motivating them to adopt healthier eating habits together with their children. Participants further suggested various ways in which MICA can help them safely manage their children?s eating behaviors and provide customized support for their proxy needs and goals. Conclusions: We have discussed how the user experience of CAs can be improved to uniquely offer support to parents who serve as proxies in changing the behavior of their children. We have concluded with implications for a larger context of designing MI-based CAs for supporting proxy relationships for health behavior change. UR - https://humanfactors.jmir.org/2022/4/e38908 UR - http://dx.doi.org/10.2196/38908 UR - http://www.ncbi.nlm.nih.gov/pubmed/36206036 ID - info:doi/10.2196/38908 ER - TY - JOUR AU - Peng, L. Mary AU - Wickersham, A. Jeffrey AU - Altice, L. Frederick AU - Shrestha, Roman AU - Azwa, Iskandar AU - Zhou, Xin AU - Halim, Ab Mohd Akbar AU - Ikhtiaruddin, Mohd Wan AU - Tee, Vincent AU - Kamarulzaman, Adeeba AU - Ni, Zhao PY - 2022/10/6 TI - Formative Evaluation of the Acceptance of HIV Prevention Artificial Intelligence Chatbots By Men Who Have Sex With Men in Malaysia: Focus Group Study JO - JMIR Form Res SP - e42055 VL - 6 IS - 10 KW - artificial intelligence KW - chatbot KW - HIV prevention KW - implementation science KW - men who have sex with men KW - MSM KW - mobile health design KW - mHealth design KW - unified theory of acceptance and use of technology KW - mobile phone N2 - Background: Mobile technologies are being increasingly developed to support the practice of medicine, nursing, and public health, including HIV testing and prevention. Chatbots using artificial intelligence (AI) are novel mobile health strategies that can promote HIV testing and prevention among men who have sex with men (MSM) in Malaysia, a hard-to-reach population at elevated risk of HIV, yet little is known about the features that are important to this key population. Objective: The aim of this study was to identify the barriers to and facilitators of Malaysian MSM?s acceptance of an AI chatbot designed to assist in HIV testing and prevention in relation to its perceived benefits, limitations, and preferred features among potential users. Methods: We conducted 5 structured web-based focus group interviews with 31 MSM in Malaysia between July 2021 and September 2021. The interviews were first recorded, transcribed, coded, and thematically analyzed using NVivo (version 9; QSR International). Subsequently, the unified theory of acceptance and use of technology was used to guide data analysis to map emerging themes related to the barriers to and facilitators of chatbot acceptance onto its 4 domains: performance expectancy, effort expectancy, facilitating conditions, and social influence. Results: Multiple barriers and facilitators influencing MSM?s acceptance of an AI chatbot were identified for each domain. Performance expectancy (ie, the perceived usefulness of the AI chatbot) was influenced by MSM?s concerns about the AI chatbot?s ability to deliver accurate information, its effectiveness in information dissemination and problem-solving, and its ability to provide emotional support and raise health awareness. Convenience, cost, and technical errors influenced the AI chatbot?s effort expectancy (ie, the perceived ease of use). Efficient linkage to health care professionals and HIV self-testing was reported as a facilitating condition of MSM?s receptiveness to using an AI chatbot to access HIV testing. Participants stated that social influence (ie, sociopolitical climate) factors influencing the acceptance of mobile technology that addressed HIV in Malaysia included privacy concerns, pervasive stigma against homosexuality, and the criminalization of same-sex sexual behaviors. Key design strategies that could enhance MSM?s acceptance of an HIV prevention AI chatbot included an anonymous user setting; embedding the chatbot in MSM-friendly web-based platforms; and providing user-guiding questions and options related to HIV testing, prevention, and treatment. Conclusions: This study provides important insights into key features and potential implementation strategies central to designing an AI chatbot as a culturally sensitive digital health tool to prevent stigmatized health conditions in vulnerable and systematically marginalized populations. Such features not only are crucial to designing effective user-centered and culturally situated mobile health interventions for MSM in Malaysia but also illuminate the importance of incorporating social stigma considerations into health technology implementation strategies. UR - https://formative.jmir.org/2022/10/e42055 UR - http://dx.doi.org/10.2196/42055 UR - http://www.ncbi.nlm.nih.gov/pubmed/36201390 ID - info:doi/10.2196/42055 ER - TY - JOUR AU - Dhinagaran, Ardhithy Dhakshenya AU - Martinengo, Laura AU - Ho, Ringo Moon-Ho AU - Joty, Shafiq AU - Kowatsch, Tobias AU - Atun, Rifat AU - Tudor Car, Lorainne PY - 2022/10/4 TI - Designing, Developing, Evaluating, and Implementing a Smartphone-Delivered, Rule-Based Conversational Agent (DISCOVER): Development of a Conceptual Framework JO - JMIR Mhealth Uhealth SP - e38740 VL - 10 IS - 10 KW - conceptual framework KW - conversational agent KW - chatbot KW - mobile health KW - mHealth KW - digital health KW - mobile phone N2 - Background: Conversational agents (CAs), also known as chatbots, are computer programs that simulate human conversations by using predetermined rule-based responses or artificial intelligence algorithms. They are increasingly used in health care, particularly via smartphones. There is, at present, no conceptual framework guiding the development of smartphone-based, rule-based CAs in health care. To fill this gap, we propose structured and tailored guidance for their design, development, evaluation, and implementation. Objective: The aim of this study was to develop a conceptual framework for the design, evaluation, and implementation of smartphone-delivered, rule-based, goal-oriented, and text-based CAs for health care. Methods: We followed the approach by Jabareen, which was based on the grounded theory method, to develop this conceptual framework. We performed 2 literature reviews focusing on health care CAs and conceptual frameworks for the development of mobile health interventions. We identified, named, categorized, integrated, and synthesized the information retrieved from the literature reviews to develop the conceptual framework. We then applied this framework by developing a CA and testing it in a feasibility study. Results: The Designing, Developing, Evaluating, and Implementing a Smartphone-Delivered, Rule-Based Conversational Agent (DISCOVER) conceptual framework includes 8 iterative steps grouped into 3 stages, as follows: design, comprising defining the goal, creating an identity, assembling the team, and selecting the delivery interface; development, including developing the content and building the conversation flow; and the evaluation and implementation of the CA. They were complemented by 2 cross-cutting considerations?user-centered design and privacy and security?that were relevant at all stages. This conceptual framework was successfully applied in the development of a CA to support lifestyle changes and prevent type 2 diabetes. Conclusions: Drawing on published evidence, the DISCOVER conceptual framework provides a step-by-step guide for developing rule-based, smartphone-delivered CAs. Further evaluation of this framework in diverse health care areas and settings and for a variety of users is needed to demonstrate its validity. Future research should aim to explore the use of CAs to deliver health care interventions, including behavior change and potential privacy and safety concerns. UR - https://mhealth.jmir.org/2022/10/e38740 UR - http://dx.doi.org/10.2196/38740 UR - http://www.ncbi.nlm.nih.gov/pubmed/36194462 ID - info:doi/10.2196/38740 ER - TY - JOUR AU - Martinengo, Laura AU - Jabir, Ishqi Ahmad AU - Goh, Tin Westin Wei AU - Lo, Wai Nicholas Yong AU - Ho, Ringo Moon-Ho AU - Kowatsch, Tobias AU - Atun, Rifat AU - Michie, Susan AU - Tudor Car, Lorainne PY - 2022/10/3 TI - Conversational Agents in Health Care: Scoping Review of Their Behavior Change Techniques and Underpinning Theory JO - J Med Internet Res SP - e39243 VL - 24 IS - 10 KW - behavior change KW - behavior change techniques KW - conversational agent KW - chatbot KW - mHealth N2 - Background: Conversational agents (CAs) are increasingly used in health care to deliver behavior change interventions. Their evaluation often includes categorizing the behavior change techniques (BCTs) using a classification system of which the BCT Taxonomy v1 (BCTTv1) is one of the most common. Previous studies have presented descriptive summaries of behavior change interventions delivered by CAs, but no in-depth study reporting the use of BCTs in these interventions has been published to date. Objective: This review aims to describe behavior change interventions delivered by CAs and to identify the BCTs and theories guiding their design. Methods: We searched PubMed, Embase, Cochrane?s Central Register of Controlled Trials, and the first 10 pages of Google and Google Scholar in April 2021. We included primary, experimental studies evaluating a behavior change intervention delivered by a CA. BCTs coding followed the BCTTv1. Two independent reviewers selected the studies and extracted the data. Descriptive analysis and frequent itemset mining to identify BCT clusters were performed. Results: We included 47 studies reporting on mental health (n=19, 40%), chronic disorders (n=14, 30%), and lifestyle change (n=14, 30%) interventions. There were 20/47 embodied CAs (43%) and 27/47 CAs (57%) represented a female character. Most CAs were rule based (34/47, 72%). Experimental interventions included 63 BCTs, (mean 9 BCTs; range 2-21 BCTs), while comparisons included 32 BCTs (mean 2 BCTs; range 2-17 BCTs). Most interventions included BCTs 4.1 ?Instruction on how to perform a behavior? (34/47, 72%), 3.3 ?Social support? (emotional; 27/47, 57%), and 1.2 ?Problem solving? (24/47, 51%). A total of 12/47 studies (26%) were informed by a behavior change theory, mainly the Transtheoretical Model and the Social Cognitive Theory. Studies using the same behavior change theory included different BCTs. Conclusions: There is a need for the more explicit use of behavior change theories and improved reporting of BCTs in CA interventions to enhance the analysis of intervention effectiveness and improve the reproducibility of research. UR - https://www.jmir.org/2022/10/e39243 UR - http://dx.doi.org/10.2196/39243 UR - http://www.ncbi.nlm.nih.gov/pubmed/36190749 ID - info:doi/10.2196/39243 ER - TY - JOUR AU - Ta-Johnson, P. Vivian AU - Boatfield, Carolynn AU - Wang, Xinyu AU - DeCero, Esther AU - Krupica, C. Isabel AU - Rasof, D. Sophie AU - Motzer, Amelie AU - Pedryc, M. Wiktoria PY - 2022/10/3 TI - Assessing the Topics and Motivating Factors Behind Human-Social Chatbot Interactions: Thematic Analysis of User Experiences JO - JMIR Hum Factors SP - e38876 VL - 9 IS - 4 KW - social chatbots KW - Replika KW - emotional chatbots KW - artificial intelligence KW - thematic analysis KW - human-chatbot interactions KW - chatbot KW - usability KW - interaction KW - human factors KW - motivation KW - topics KW - AI KW - perception KW - usage N2 - Background: Although social chatbot usage is expected to increase as language models and artificial intelligence improve, very little is known about the dynamics of human-social chatbot interactions. Specifically, there is a paucity of research examining why human-social chatbot interactions are initiated and the topics that are discussed. Objective: We sought to identify the motivating factors behind initiating contact with Replika, a popular social chatbot, and the topics discussed in these interactions. Methods: A sample of Replika users completed a survey that included open-ended questions pertaining to the reasons why they initiated contact with Replika and the topics they typically discuss. Thematic analyses were then used to extract themes and subthemes regarding the motivational factors behind Replika use and the types of discussions that take place in conversations with Replika. Results: Users initiated contact with Replika out of interest, in search of social support, and to cope with mental and physical health conditions. Users engaged in a wide variety of discussion topics with their Replika, including intellectual topics, life and work, recreation, mental health, connection, Replika, current events, and other people. Conclusions: Given the wide range of motivational factors and discussion topics that were reported, our results imply that multifaceted support can be provided by a single social chatbot. While previous research already established that social chatbots can effectively help address mental and physical health issues, these capabilities have been dispersed across several different social chatbots instead of deriving from a single one. Our results also highlight a motivating factor of human-social chatbot usage that has received less attention than other motivating factors: interest. Users most frequently reported using Replika out of interest and sought to explore its capabilities and learn more about artificial intelligence. Thus, while developers and researchers study human-social chatbot interactions with the efficacy of the social chatbot and its targeted user base in mind, it is equally important to consider how its usage can shape public perceptions and support for social chatbots and artificial agents in general. UR - https://humanfactors.jmir.org/2022/4/e38876 UR - http://dx.doi.org/10.2196/38876 UR - http://www.ncbi.nlm.nih.gov/pubmed/36190745 ID - info:doi/10.2196/38876 ER - TY - JOUR AU - Whittaker, Robyn AU - Dobson, Rosie AU - Garner, Katie PY - 2022/9/26 TI - Chatbots for Smoking Cessation: Scoping Review JO - J Med Internet Res SP - e35556 VL - 24 IS - 9 KW - chatbot KW - conversational agent KW - COVID-19 KW - smoking cessation N2 - Background: Despite significant progress in reducing tobacco use over the past 2 decades, tobacco still kills over 8 million people every year. Digital interventions, such as text messaging, have been found to help people quit smoking. Chatbots, or conversational agents, are new digital tools that mimic instantaneous human conversation and therefore could extend the effectiveness of text messaging. Objective: This scoping review aims to assess the extent of research in the chatbot literature for smoking cessation and provide recommendations for future research in this area. Methods: Relevant studies were identified through searches conducted in Embase, MEDLINE, APA PsycINFO, Google Scholar, and Scopus, as well as additional searches on JMIR, Cochrane Library, Lancet Digital Health, and Digital Medicine. Studies were considered if they were conducted with tobacco smokers, were conducted between 2000 and 2021, were available in English, and included a chatbot intervention. Results: Of 323 studies identified, 10 studies were included in the review (3 framework articles, 1 study protocol, 2 pilot studies, 2 trials, and 2 randomized controlled trials). Most studies noted some benefits related to smoking cessation and participant engagement; however, outcome measures varied considerably. The quality of the studies overall was low, with methodological issues and low follow-up rates. Conclusions: More research is needed to make a firm conclusion about the efficacy of chatbots for smoking cessation. Researchers need to provide more in-depth descriptions of chatbot functionality, mode of delivery, and theoretical underpinnings. Consistency in language and terminology would also assist in reviews of what approaches work across the field. UR - https://www.jmir.org/2022/9/e35556 UR - http://dx.doi.org/10.2196/35556 UR - http://www.ncbi.nlm.nih.gov/pubmed/36095295 ID - info:doi/10.2196/35556 ER - TY - JOUR AU - Danieli, Morena AU - Ciulli, Tommaso AU - Mousavi, Mahed Seyed AU - Silvestri, Giorgia AU - Barbato, Simone AU - Di Natale, Lorenzo AU - Riccardi, Giuseppe PY - 2022/9/23 TI - Assessing the Impact of Conversational Artificial Intelligence in the Treatment of Stress and Anxiety in Aging Adults: Randomized Controlled Trial JO - JMIR Ment Health SP - e38067 VL - 9 IS - 9 KW - mental health care KW - conversational artificial intelligence KW - mobile health KW - mHealth KW - personal health care agent N2 - Background: While mental health applications are increasingly becoming available for large populations of users, there is a lack of controlled trials on the impacts of such applications. Artificial intelligence (AI)-empowered agents have been evaluated when assisting adults with cognitive impairments; however, few applications are available for aging adults who are still actively working. These adults often have high stress levels related to changes in their work places, and related symptoms eventually affect their quality of life. Objective: We aimed to evaluate the contribution of TEO (Therapy Empowerment Opportunity), a mobile personal health care agent with conversational AI. TEO promotes mental health and well-being by engaging patients in conversations to recollect the details of events that increased their anxiety and by providing therapeutic exercises and suggestions. Methods: The study was based on a protocolized intervention for stress and anxiety management. Participants with stress symptoms and mild-to-moderate anxiety received an 8-week cognitive behavioral therapy (CBT) intervention delivered remotely. A group of participants also interacted with the agent TEO. The participants were active workers aged over 55 years. The experimental groups were as follows: group 1, traditional therapy; group 2, traditional therapy and mobile health (mHealth) agent; group 3, mHealth agent; and group 4, no treatment (assigned to a waiting list). Symptoms related to stress (anxiety, physical disease, and depression) were assessed prior to treatment (T1), at the end (T2), and 3 months after treatment (T3), using standardized psychological questionnaires. Moreover, the Patient Health Questionnaire-8 and General Anxiety Disorders-7 scales were administered before the intervention (T1), at mid-term (T2), at the end of the intervention (T3), and after 3 months (T4). At the end of the intervention, participants in groups 1, 2, and 3 filled in a satisfaction questionnaire. Results: Despite randomization, statistically significant differences between groups were present at T1. Group 4 showed lower levels of anxiety and depression compared with group 1, and lower levels of stress compared with group 2. Comparisons between groups at T2 and T3 did not show significant differences in outcomes. Analyses conducted within groups showed significant differences between times in group 2, with greater improvements in the levels of stress and scores related to overall well-being. A general worsening trend between T2 and T3 was detected in all groups, with a significant increase in stress levels in group 2. Group 2 reported higher levels of perceived usefulness and satisfaction. Conclusions: No statistically significant differences could be observed between participants who used the mHealth app alone or within the traditional CBT setting. However, the results indicated significant differences within the groups that received treatment and a stable tendency toward improvement, which was limited to individual perceptions of stress-related symptoms. Trial Registration: ClinicalTrials.gov NCT04809090; https://clinicaltrials.gov/ct2/show/NCT04809090 UR - https://mental.jmir.org/2022/9/e38067 UR - http://dx.doi.org/10.2196/38067 UR - http://www.ncbi.nlm.nih.gov/pubmed/36149730 ID - info:doi/10.2196/38067 ER - TY - JOUR AU - Zidoun, Youness AU - Kaladhara, Sreelekshmi AU - Powell, Leigh AU - Nour, Radwa AU - Al Suwaidi, Hanan AU - Zary, Nabil PY - 2022/8/23 TI - Contextual Conversational Agent to Address Vaccine Hesitancy: Protocol for a Design-Based Research Study JO - JMIR Res Protoc SP - e38043 VL - 11 IS - 8 KW - conversational agent KW - design-based research KW - chatbot KW - Rasa KW - NLU KW - COVID-19 KW - vaccine hesitancy KW - misinformation KW - vaccination KW - iterative design KW - health communication KW - health information KW - System Usability Scale N2 - Background: Since the beginning of the COVID-19 pandemic, people have been exposed to misinformation, leading to many myths about SARS-CoV-2 and the vaccines against it. As this situation does not seem to end soon, many authorities and health organizations, including the World Health Organization (WHO), are utilizing conversational agents (CAs) in their fight against it. Although the impact and usage of these novel digital strategies are noticeable, the design of the CAs remains key to their success. Objective: This study describes the use of design-based research (DBR) for contextual CA design to address vaccine hesitancy. In addition, this protocol will examine the impact of DBR on CA design to understand how this iterative process can enhance accuracy and performance. Methods: A DBR methodology will be used for this study. Each phase of analysis, design, and evaluation of each design cycle inform the next one via its outcomes. An anticipated generic strategy will be formed after completing the first iteration. Using multiple research studies, frameworks and theoretical approaches are tested and evaluated through the different design cycles. User perception of the CA will be analyzed or collected by implementing a usability assessment during every evaluation phase using the System Usability Scale. The PARADISE (PARAdigm for Dialogue System Evaluation) method will be adopted to calculate the performance of this text-based CA. Results: Two phases of the first design cycle (design and evaluation) were completed at the time of this writing (April 2022). The research team is currently reviewing the natural-language understanding model as part of the conversation-driven development (CDD) process in preparation for the first pilot intervention, which will conclude the CA?s first design cycle. In addition, conversational data will be analyzed quantitatively and qualitatively as part of the reflection and revision process to inform the subsequent design cycles. This project plans for three rounds of design cycles, resulting in various studies spreading outcomes and conclusions. The results of the first study describing the entire first design cycle are expected to be submitted for publication before the end of 2022. Conclusions: CAs constitute an innovative way of delivering health communication information. However, they are primarily used to contribute to behavioral change or educate people about health issues. Therefore, health chatbots? impact should be carefully designed to meet outcomes. DBR can help shape a holistic understanding of the process of CA conception. This protocol describes the design of VWise, a contextual CA that aims to address vaccine hesitancy using the DBR methodology. The results of this study will help identify the strengths and flaws of DBR?s application to such innovative projects. UR - https://www.researchprotocols.org/2022/8/e38043 UR - http://dx.doi.org/10.2196/38043 UR - http://www.ncbi.nlm.nih.gov/pubmed/35797423 ID - info:doi/10.2196/38043 ER - TY - JOUR AU - Bui, An Truong AU - Pohl, Megan AU - Rosenfelt, Cory AU - Ogourtsova, Tatiana AU - Yousef, Mahdieh AU - Whitlock, Kerri AU - Majnemer, Annette AU - Nicholas, David AU - Demmans Epp, Carrie AU - Zaiane, Osmar AU - Bolduc, V. François PY - 2022/8/19 TI - Identifying Potential Gamification Elements for A New Chatbot for Families With Neurodevelopmental Disorders: User-Centered Design Approach JO - JMIR Hum Factors SP - e31991 VL - 9 IS - 3 KW - gamification KW - chatbot KW - neurodevelopmental disorders KW - engagement KW - mobile health KW - mHealth KW - eHealth KW - focus group KW - interview KW - user-centered design KW - health information technologies N2 - Background: Chatbots have been increasingly considered for applications in the health care field. However, it remains unclear how a chatbot can assist users with complex health needs, such as parents of children with neurodevelopmental disorders (NDDs) who need ongoing support. Often, this population must deal with complex and overwhelming health information, which can make parents less likely to use a software that may be very helpful. An approach to enhance user engagement is incorporating game elements in nongame contexts, known as gamification. Gamification needs to be tailored to users; however, there has been no previous assessment of gamification use in chatbots for NDDs. Objective: We sought to examine how gamification elements are perceived and whether their implementation in chatbots will be well received among parents of children with NDDs. We have discussed some elements in detail as the initial step of the project. Methods: We performed a narrative literature review of gamification elements, specifically those used in health and education. Among the elements identified in the literature, our health and social science experts in NDDs prioritized five elements for in-depth discussion: goal setting, customization, rewards, social networking, and unlockable content. We used a qualitative approach, which included focus groups and interviews with parents of children with NDDs (N=21), to assess the acceptability of the potential implementation of these elements in an NDD-focused chatbot. Parents were asked about their opinions on the 5 elements and to rate them. Video and audio recordings were transcribed and summarized for emerging themes, using deductive and inductive thematic approaches. Results: From the responses obtained from 21 participants, we identified three main themes: parents of children with NDDs were familiar with and had positive experiences with gamification; a specific element (goal setting) was important to all parents, whereas others (customization, rewards, and unlockable content) received mixed opinions; and the social networking element received positive feedback, but concerns about information accuracy were raised. Conclusions: We showed for the first time that parents of children with NDDs support gamification use in a chatbot for NDDs. Our study illustrates the need for a user-centered design in the medical domain and provides a foundation for researchers interested in developing chatbots for populations that are medically vulnerable. Future studies exploring wide range of gamification elements with large number of potential users are needed to understand the impact of gamification elements in enhancing knowledge mobilization. UR - https://humanfactors.jmir.org/2022/3/e31991 UR - http://dx.doi.org/10.2196/31991 UR - http://www.ncbi.nlm.nih.gov/pubmed/35984679 ID - info:doi/10.2196/31991 ER - TY - JOUR AU - Selmouni, Farida AU - Guy, Marine AU - Muwonge, Richard AU - Nassiri, Abdelhak AU - Lucas, Eric AU - Basu, Partha AU - Sauvaget, Catherine PY - 2022/8/2 TI - Effectiveness of Artificial Intelligence?Assisted Decision-making to Improve Vulnerable Women?s Participation in Cervical Cancer Screening in France: Protocol for a Cluster Randomized Controlled Trial (AppDate-You) JO - JMIR Res Protoc SP - e39288 VL - 11 IS - 8 KW - cervical cancer KW - screening KW - chatbot KW - decision aid KW - artificial intelligence KW - cluster randomized controlled trial N2 - Background: The French organized population-based cervical cancer screening (CCS) program transitioned from a cytology-based to a human papillomavirus (HPV)?based screening strategy in August 2020. HPV testing is offered every 5 years, starting at the age of 30 years. In the new program, women are invited to undergo an HPV test at a gynecologist?s, primary care physician?s, or midwife?s office, a private clinic or health center, family planning center, or hospital. HPV self-sampling (HPVss) was also made available as an additional approach. However, French studies reported that less than 20% of noncompliant women performed vaginal self-sampling when a kit was sent to their home. Women with lower income and educational levels participate less in CCS. Lack of information about the disease and the benefits of CCS were reported as one of the major barriers among noncompliant women. This barrier could be addressed by overcoming disparities in HPV- and cervical cancer?related knowledge and perceptions about CCS. Objective: This study aimed to assess the effectiveness of a chatbot-based decision aid to improve women?s participation in the HPVss detection-based CCS care pathway. Methods: AppDate-You is a 2-arm cluster randomized controlled trial (cRCT) nested within the French organized CCS program. Eligible women are those aged 30-65 years who have not been screened for CC for more than 4 years and live in the disadvantaged clusters in the Occitanie Region, France. In total, 32 clusters will be allocated to the intervention and control arms, 16 in each arm (approximately 4000 women). Eligible women living in randomly selected disadvantaged clusters will be identified using the Regional Cancer Screening Coordinating Centre of Occitanie (CRCDC-OC) database. Women in the experimental group will receive screening reminder letters and HPVss kits, combined with access to a chatbot-based decision aid tailored to women with lower education attainment. Women in the control group will receive the reminder letters and HPVss kits (standard of care). The CRCDC-OC database will be used to check trial progress and assess the intervention?s impact. The trial has 2 primary outcomes: (1) the proportion of screening participation within 12 months among women recalled for CCS and (2) the proportion of HPVss-positive women who are ?well-managed? as stipulated in the French guidelines. Results: To date, the AppDate-You study group is preparing and developing the chatbot-based decision aid (intervention). The cRCT will be conducted once the decision aid has been completed and validated. Recruitment of women is expected to begin in January 2023. Conclusions: This study is the first to evaluate the impact of a chatbot-based decision aid to promote the CCS program and increase its performance. The study results will inform policy makers and health professionals as well as the research community. Trial Registration: ClinicalTrials.gov NCT05286034; https://clinicaltrials.gov/ct2/show/NCT05286034 International Registered Report Identifier (IRRID): PRR1-10.2196/39288 UR - https://www.researchprotocols.org/2022/8/e39288 UR - http://dx.doi.org/10.2196/39288 UR - http://www.ncbi.nlm.nih.gov/pubmed/35771872 ID - info:doi/10.2196/39288 ER - TY - JOUR AU - Noble, M. Jasmine AU - Zamani, Ali AU - Gharaat, MohamadAli AU - Merrick, Dylan AU - Maeda, Nathanial AU - Lambe Foster, Alex AU - Nikolaidis, Isabella AU - Goud, Rachel AU - Stroulia, Eleni AU - Agyapong, O. Vincent I. AU - Greenshaw, J. Andrew AU - Lambert, Simon AU - Gallson, Dave AU - Porter, Ken AU - Turner, Debbie AU - Zaiane, Osmar PY - 2022/7/25 TI - Developing, Implementing, and Evaluating an Artificial Intelligence?Guided Mental Health Resource Navigation Chatbot for Health Care Workers and Their Families During and Following the COVID-19 Pandemic: Protocol for a Cross-sectional Study JO - JMIR Res Protoc SP - e33717 VL - 11 IS - 7 KW - eHealth KW - chatbot KW - conversational agent KW - health system navigation KW - electronic health care KW - mobile phone N2 - Background: Approximately 1 in 3 Canadians will experience an addiction or mental health challenge at some point in their lifetime. Unfortunately, there are multiple barriers to accessing mental health care, including system fragmentation, episodic care, long wait times, and insufficient support for health system navigation. In addition, stigma may further reduce an individual?s likelihood of seeking support. Digital technologies present new and exciting opportunities to bridge significant gaps in mental health care service provision, reduce barriers pertaining to stigma, and improve health outcomes for patients and mental health system integration and efficiency. Chatbots (ie, software systems that use artificial intelligence to carry out conversations with people) may be explored to support those in need of information or access to services and present the opportunity to address gaps in traditional, fragmented, or episodic mental health system structures on demand with personalized attention. The recent COVID-19 pandemic has exacerbated even further the need for mental health support among Canadians and called attention to the inefficiencies of our system. As health care workers and their families are at an even greater risk of mental illness and psychological distress during the COVID-19 pandemic, this technology will be first piloted with the goal of supporting this vulnerable group. Objective: This pilot study seeks to evaluate the effectiveness of the Mental Health Intelligent Information Resource Assistant in supporting health care workers and their families in the Canadian provinces of Alberta and Nova Scotia with the provision of appropriate information on mental health issues, services, and programs based on personalized needs. Methods: The effectiveness of the technology will be assessed via voluntary follow-up surveys and an analysis of client interactions and engagement with the chatbot. Client satisfaction with the chatbot will also be assessed. Results: This project was initiated on April 1, 2021. Ethics approval was granted on August 12, 2021, by the University of Alberta Health Research Board (PRO00109148) and on April 21, 2022, by the Nova Scotia Health Authority Research Ethics Board (1027474). Data collection is anticipated to take place from May 2, 2022, to May 2, 2023. Publication of preliminary results will be sought in spring or summer 2022, with a more comprehensive evaluation completed by spring 2023 following the collection of a larger data set. Conclusions: Our findings can be incorporated into public policy and planning around mental health system navigation by Canadian mental health care providers?from large public health authorities to small community-based, not-for-profit organizations. This may serve to support the development of an additional touch point, or point of entry, for individuals to access the appropriate services or care when they need them, wherever they are. International Registered Report Identifier (IRRID): PRR1-10.2196/33717 UR - https://www.researchprotocols.org/2022/7/e33717 UR - http://dx.doi.org/10.2196/33717 UR - http://www.ncbi.nlm.nih.gov/pubmed/35877158 ID - info:doi/10.2196/33717 ER - TY - JOUR AU - Shan, Yi AU - Ji, Meng AU - Xie, Wenxiu AU - Qian, Xiaobo AU - Li, Rongying AU - Zhang, Xiaomin AU - Hao, Tianyong PY - 2022/7/8 TI - Language Use in Conversational Agent?Based Health Communication: Systematic Review JO - J Med Internet Res SP - e37403 VL - 24 IS - 7 KW - systematic review KW - health communication KW - language use KW - conversational agent N2 - Background: Given the growing significance of conversational agents (CAs), researchers have conducted a plethora of relevant studies on various technology- and usability-oriented issues. However, few investigations focus on language use in CA-based health communication to examine its influence on the user perception of CAs and their role in delivering health care services. Objective: This review aims to present the language use of CAs in health care to identify the achievements made and breakthroughs to be realized to inform researchers and more specifically CA designers. Methods: This review was conducted by following the protocols of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 statement. We first designed the search strategy according to the research aim and then performed the keyword searches in PubMed and ProQuest databases for retrieving relevant publications (n=179). Subsequently, 3 researchers screened and reviewed the publications independently to select studies meeting the predefined selection criteria. Finally, we synthesized and analyzed the eligible articles (N=11) through thematic synthesis. Results: Among the 11 included publications, 6 deal exclusively with the language use of the CAs studied, and the remaining 5 are only partly related to this topic. The language use of the CAs in these studies can be roughly classified into six themes: (1) personal pronouns, (2) responses to health and lifestyle prompts, (3) strategic wording and rich linguistic resources, (4) a 3-staged conversation framework, (5) human-like well-manipulated conversations, and (6) symbols and images coupled with phrases. These derived themes effectively engaged users in health communication. Meanwhile, we identified substantial room for improvement based on the inconsistent responses of some CAs and their inability to present large volumes of information on safety-critical health and lifestyle prompts. Conclusions: This is the first systematic review of language use in CA-based health communication. The results and limitations identified in the 11 included papers can give fresh insights into the design and development, popularization, and research of CA applications. This review can provide practical implications for incorporating positive language use into the design of health CAs and improving their effective language output in health communication. In this way, upgraded CAs will be more capable of handling various health problems particularly in the context of nationwide and even worldwide public health crises. UR - https://www.jmir.org/2022/7/e37403 UR - http://dx.doi.org/10.2196/37403 UR - http://www.ncbi.nlm.nih.gov/pubmed/35802407 ID - info:doi/10.2196/37403 ER - TY - JOUR AU - Alphonse, Alice AU - Stewart, Kezia AU - Brown, Jamie AU - Perski, Olga PY - 2022/7/7 TI - Exploring Users? Experiences With a Quick-Response Chatbot Within a Popular Smoking Cessation Smartphone App: Semistructured Interview Study JO - JMIR Form Res SP - e36869 VL - 6 IS - 7 KW - chatbot KW - conversational agent KW - engagement KW - smartphone app KW - smoking cessation KW - accountability KW - mobile phone N2 - Background: Engagement with smartphone apps for smoking cessation tends to be low. Chatbots (ie, software that enables conversations with users) offer a promising means of increasing engagement. Objective: We aimed to explore smokers? experiences with a quick-response chatbot (Quit Coach) implemented within a popular smoking cessation app and identify factors that influence users? engagement with Quit Coach. Methods: In-depth, one-to-one, semistructured qualitative interviews were conducted with adult, past-year smokers who had voluntarily used Quit Coach in a recent smoking cessation attempt (5/14, 36%) and current smokers who agreed to download and use Quit Coach for a minimum of 2 weeks to support a new cessation attempt (9/14, 64%). Verbal reports were audio recorded, transcribed verbatim, and analyzed within a constructivist theoretical framework using inductive thematic analysis. Results: A total of 3 high-order themes were generated to capture users? experiences and engagement with Quit Coach: anthropomorphism of and accountability to Quit Coach (ie, users ascribing human-like characteristics and thoughts to the chatbot, which helped foster a sense of accountability to it), Quit Coach?s interaction style and format (eg, positive and motivational tone of voice and quick and easy-to-complete check-ins), and users? perceived need for support (ie, chatbot engagement was motivated by seeking distraction from cravings or support to maintain motivation to stay quit). Conclusions: Anthropomorphism of a quick-response chatbot implemented within a popular smoking cessation app appeared to be enabled by its interaction style and format and users? perceived need for support, which may have given rise to feelings of accountability and increased engagement. UR - https://formative.jmir.org/2022/7/e36869 UR - http://dx.doi.org/10.2196/36869 UR - http://www.ncbi.nlm.nih.gov/pubmed/35797093 ID - info:doi/10.2196/36869 ER - TY - JOUR AU - Weeks, Rose AU - Cooper, Lyra AU - Sangha, Pooja AU - Sedoc, João AU - White, Sydney AU - Toledo, Assaf AU - Gretz, Shai AU - Lahav, Dan AU - Martin, Nina AU - Michel, Alexandra AU - Lee, Hyoung Jae AU - Slonim, Noam AU - Bar-Zeev, Naor PY - 2022/7/6 TI - Chatbot-Delivered COVID-19 Vaccine Communication Message Preferences of Young Adults and Public Health Workers in Urban American Communities: Qualitative Study JO - J Med Internet Res SP - e38418 VL - 24 IS - 7 KW - vaccine hesitancy KW - COVID-19 KW - chatbots KW - AI KW - artificial intelligence KW - natural language processing KW - social media KW - vaccine communication KW - digital health KW - misinformation KW - infodemic KW - infodemiology KW - conversational agent KW - public health KW - user need KW - vaccination KW - health communication KW - online health information N2 - Background: Automated conversational agents, or chatbots, have a role in reinforcing evidence-based guidance delivered through other media and offer an accessible, individually tailored channel for public engagement. In early-to-mid 2021, young adults and minority populations disproportionately affected by COVID-19 in the United States were more likely to be hesitant toward COVID-19 vaccines, citing concerns regarding vaccine safety and effectiveness. Successful chatbot communication requires purposive understanding of user needs. Objective: We aimed to review the acceptability of messages to be delivered by a chatbot named VIRA from Johns Hopkins University. The study investigated which message styles were preferred by young, urban-dwelling Americans as well as public health workers, since we anticipated that the chatbot would be used by the latter as a job aid. Methods: We conducted 4 web-based focus groups with 20 racially and ethnically diverse young adults aged 18-28 years and public health workers aged 25-61 years living in or near eastern-US cities. We tested 6 message styles, asking participants to select a preferred response style for a chatbot answering common questions about COVID-19 vaccines. We transcribed, coded, and categorized emerging themes within the discussions of message content, style, and framing. Results: Participants preferred messages that began with an empathetic reflection of a user concern and concluded with a straightforward, fact-supported response. Most participants disapproved of moralistic or reasoning-based appeals to get vaccinated, although public health workers felt that such strong statements appealing to communal responsibility were warranted. Responses tested with humor and testimonials did not appeal to the participants. Conclusions: To foster credibility, chatbots targeting young people with vaccine-related messaging should aim to build rapport with users by deploying empathic, reflective statements, followed by direct and comprehensive responses to user queries. Further studies are needed to inform the appropriate use of user-customized testimonials and humor in the context of chatbot communication. UR - https://www.jmir.org/2022/7/e38418 UR - http://dx.doi.org/10.2196/38418 UR - http://www.ncbi.nlm.nih.gov/pubmed/35737898 ID - info:doi/10.2196/38418 ER - TY - JOUR AU - Moore, Nathan AU - Ahmadpour, Naseem AU - Brown, Martin AU - Poronnik, Philip AU - Davids, Jennifer PY - 2022/7/6 TI - Designing Virtual Reality?Based Conversational Agents to Train Clinicians in Verbal De-escalation Skills: Exploratory Usability Study JO - JMIR Serious Games SP - e38669 VL - 10 IS - 3 KW - virtual reality KW - code black KW - verbal de-escalation KW - violence and aggression KW - education KW - clinical training KW - conversational agent N2 - Background: Violence and aggression are significant workplace challenges faced by clinicians worldwide. Traditional methods of training consist of ?on-the-job learning? and role-play simulations. Although both approaches can result in improved skill levels, they are not without limitation. Interactive simulations using virtual reality (VR) can complement traditional training processes as a cost-effective, engaging, easily accessible, and flexible training tool. Objective: In this exploratory study, we aimed to determine the feasibility of and barriers to verbal engagement with a virtual agent in the context of the Code Black VR application. Code Black VR is a new interactive VR-based verbal de-escalation trainer that we developed based on the Clinical Training Through VR Design Framework. Methods: In total, 28 participants with varying clinical expertise from 4 local hospitals enrolled in the Western Sydney Local Health District Clinical Initiative Nurse program and Transition to Emergency Nursing Programs and participated in 1 of 5 workshops. They completed multiple playthroughs of the Code Black VR verbal de-escalation trainer application and verbally interacted with a virtual agent. We documented observations and poststudy reflection notes. After the playthroughs, the users completed the System Usability Scale and provided written comments on their experience. A thematic analysis was conducted on the results. Data were also obtained through the application itself, which also recorded the total interactions and successfully completed interactions. Results: The Code Black VR verbal de-escalation training application was well received. The findings reinforced the factors in the existing design framework and identified 3 new factors?motion sickness, perceived value, and privacy?to be considered for future application development. Conclusions: Verbal interaction with a virtual agent is feasible for training staff in verbal de-escalation skills. It is an effective medium to supplement clinician training in verbal de-escalation skills. We provide broader design considerations to guide further developments in this area. UR - https://games.jmir.org/2022/3/e38669 UR - http://dx.doi.org/10.2196/38669 UR - http://www.ncbi.nlm.nih.gov/pubmed/35793129 ID - info:doi/10.2196/38669 ER - TY - JOUR AU - Davoudi, Anahita AU - Lee, S. Natalie AU - Luong, ThaiBinh AU - Delaney, Timothy AU - Asch, Elizabeth AU - Chaiyachati, Krisda AU - Mowery, Danielle PY - 2022/6/29 TI - Identifying Medication-Related Intents From a Bidirectional Text Messaging Platform for Hypertension Management Using an Unsupervised Learning Approach: Retrospective Observational Pilot Study JO - J Med Internet Res SP - e36151 VL - 24 IS - 6 KW - chatbots KW - secure messaging systems KW - unsupervised learning KW - latent Dirichlet allocation KW - natural language processing N2 - Background: Free-text communication between patients and providers plays an increasing role in chronic disease management, through platforms varying from traditional health care portals to novel mobile messaging apps. These text data are rich resources for clinical purposes, but their sheer volume render them difficult to manage. Even automated approaches, such as natural language processing, require labor-intensive manual classification for developing training data sets. Automated approaches to organizing free-text data are necessary to facilitate use of free-text communication for clinical care. Objective: The aim of this study was to apply unsupervised learning approaches to (1) understand the types of topics discussed and (2) learn medication-related intents from messages sent between patients and providers through a bidirectional text messaging system for managing participant blood pressure (BP). Methods: This study was a secondary analysis of deidentified messages from a remote, mobile, text-based employee hypertension management program at an academic institution. We trained a latent Dirichlet allocation (LDA) model for each message type (ie, inbound patient messages and outbound provider messages) and identified the distribution of major topics and significant topics (probability >.20) across message types. Next, we annotated all medication-related messages with a single medication intent. Then, we trained a second medication-specific LDA (medLDA) model to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n=1-3 words) using spaCy, clinical named entities using Stanza, and medication categories using MedEx; we then applied chi-square feature selection to learn the most informative features associated with each medication intent. Results: In total, 253 participants and 5 providers engaged in the program, generating 12,131 total messages: 46.90% (n=5689) patient messages and 53.10% (n=6442) provider messages. Most patient messages corresponded to BP reporting, BP encouragement, and appointment scheduling; most provider messages corresponded to BP reporting, medication adherence, and confirmatory statements. Most patient and provider messages contained 1 topic and few contained more than 3 topics identified using LDA. In total, 534 medication messages were annotated with a single medication intent. Of these, 282 (52.8%) were patient medication messages: most referred to the medication request intent (n=134, 47.5%). Most of the 252 (47.2%) provider medication messages referred to the medication question intent (n=173, 68.7%). Although the medLDA model could identify a majority intent within each topic, it could not distinguish medication intents with low prevalence within patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusions: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitating the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated, deep, medication intent classification. International Registered Report Identifier (IRRID): RR2-10.1101/2021.12.23.21268061 UR - https://www.jmir.org/2022/6/e36151 UR - http://dx.doi.org/10.2196/36151 UR - http://www.ncbi.nlm.nih.gov/pubmed/35767327 ID - info:doi/10.2196/36151 ER - TY - JOUR AU - Olano-Espinosa, Eduardo AU - Avila-Tomas, Francisco Jose AU - Minue-Lorenzo, Cesar AU - Matilla-Pardo, Blanca AU - Serrano Serrano, Encarnación María AU - Martinez-Suberviola, Javier F. AU - Gil-Conesa, Mario AU - Del Cura-González, Isabel AU - PY - 2022/6/27 TI - Effectiveness of a Conversational Chatbot (Dejal@bot) for the Adult Population to Quit Smoking: Pragmatic, Multicenter, Controlled, Randomized Clinical Trial in Primary Care JO - JMIR Mhealth Uhealth SP - e34273 VL - 10 IS - 6 KW - smoking KW - tobacco cessation KW - primary care KW - smartphone use KW - chatbot KW - dialog systems KW - artificial intelligence KW - tobacco KW - mHealth N2 - Background: Tobacco addiction is the leading cause of preventable morbidity and mortality worldwide, but only 1 in 20 cessation attempts is supervised by a health professional. The potential advantages of mobile health (mHealth) can circumvent this problem and facilitate tobacco cessation interventions for public health systems. Given its easy scalability to large populations and great potential, chatbots are a potentially useful complement to usual treatment. Objective: This study aims to assess the effectiveness of an evidence-based intervention to quit smoking via a chatbot in smartphones compared with usual clinical practice in primary care. Methods: This is a pragmatic, multicenter, controlled, and randomized clinical trial involving 34 primary health care centers within the Madrid Health Service (Spain). Smokers over the age of 18 years who attended on-site consultation and accepted help to quit tobacco were recruited by their doctor or nurse and randomly allocated to receive usual care (control group [CG]) or an evidence-based chatbot intervention (intervention group [IG]). The interventions in both arms were based on the 5A?s (ie, Ask, Advise, Assess, Assist, and Arrange) in the US Clinical Practice Guideline, which combines behavioral and pharmacological treatments and is structured in several follow-up appointments. The primary outcome was continuous abstinence from smoking that was biochemically validated after 6 months by the collaborators. The outcome analysis was blinded to allocation of patients, although participants were unblinded to group assignment. An intention-to-treat analysis, using the baseline-observation-carried-forward approach for missing data, and logistic regression models with robust estimators were employed for assessing the primary outcomes. Results: The trial was conducted between October 1, 2018, and March 31, 2019. The sample included 513 patients (242 in the IG and 271 in the CG), with an average age of 49.8 (SD 10.82) years and gender ratio of 59.3% (304/513) women and 40.7% (209/513) men. Of them, 232 patients (45.2%) completed the follow-up, 104/242 (42.9%) in the IG and 128/271 (47.2%) in the CG. In the intention-to-treat analysis, the biochemically validated abstinence rate at 6 months was higher in the IG (63/242, 26%) compared with that in the CG (51/271, 18.8%; odds ratio 1.52, 95% CI 1.00-2.31; P=.05). After adjusting for basal CO-oximetry and bupropion intake, no substantial changes were observed (odds ratio 1.52, 95% CI 0.99-2.33; P=.05; pseudo-R2=0.045). In the IG, 61.2% (148/242) of users accessed the chatbot, average chatbot-patient interaction time was 121 (95% CI 121.1-140.0) minutes, and average number of contacts was 45.56 (SD 36.32). Conclusions: A treatment including a chatbot for helping with tobacco cessation was more effective than usual clinical practice in primary care. However, this outcome was at the limit of statistical significance, and therefore these promising results must be interpreted with caution. Trial Registration: Clinicaltrials.gov NCT 03445507; https://tinyurl.com/mrnfcmtd International Registered Report Identifier (IRRID): RR2-10.1186/s12911-019-0972-z UR - https://mhealth.jmir.org/2022/6/e34273 UR - http://dx.doi.org/10.2196/34273 UR - http://www.ncbi.nlm.nih.gov/pubmed/35759328 ID - info:doi/10.2196/34273 ER - TY - JOUR AU - Dulin, Patrick AU - Mertz, Robyn AU - Edwards, Alexandra AU - King, Diane PY - 2022/5/16 TI - Contrasting a Mobile App With a Conversational Chatbot for Reducing Alcohol Consumption: Randomized Controlled Pilot Trial JO - JMIR Form Res SP - e33037 VL - 6 IS - 5 KW - alcohol KW - hazardous drinking KW - smartphone app KW - chatbot KW - brief intervention KW - effectiveness KW - utilization KW - mobile phone N2 - Background: Mobile apps have shown considerable promise for reducing alcohol consumption among problem drinkers, but like many mobile health apps, they frequently report low utilization, which is an important limitation, as research suggests that effectiveness is related to higher utilization. Interactive chatbots have the ability to provide a conversational interface with users and may be more engaging and result in higher utilization and effectiveness, but there is limited research into this possibility. Objective: This study aimed to develop a chatbot alcohol intervention based on an empirically supported app (Step Away) for reducing drinking and to conduct a pilot trial of the 2 interventions. Included participants met the criteria for hazardous drinking and were interested in reducing alcohol consumption. The study assessed utilization patterns and alcohol outcomes across the 2 technology conditions, and a waitlist control group. Methods: Participants were recruited using Facebook advertisements. Those who met the criteria for hazardous consumption and expressed an interest in changing their drinking habits were randomly assigned to three conditions: the Step Away app, Step Away chatbot, and waitlist control condition. Participants were assessed on the web using the Alcohol Use Disorders Identification Test, Adapted for Use in the United States, Readiness to Change Questionnaire, Short Inventory of Problems-Revised, and Timeline Followback at baseline and at 12 weeks follow-up. Results: A total of 150 participants who completed the baseline and follow-up assessments were included in the final analysis. ANOVA results indicated that participants in the 3 conditions changed their drinking from baseline to follow-up, with large effect sizes noted (ie, ?2=0.34 for change in drinks per day across conditions). However, the differences between groups were not significant across the alcohol outcome variables. The only significant difference between conditions was in the readiness to change variable, with the bot group showing the greatest improvement in readiness (F2,147=5.6; P=.004; ?2=0.07). The results suggested that the app group used the app for a longer duration (mean 50.71, SD 49.02 days) than the bot group (mean 27.16, SD 30.54 days; P=.02). Use of the interventions was shown to predict reduced drinking in a multiple regression analysis (?=.25, 95% CI 0.00-0.01; P=.04). Conclusions: Results indicated that all groups in this study reduced their drinking considerably from baseline to the 12-week follow-up, but no differences were found in the alcohol outcome variables between the groups, possibly because of a combination of small sample size and methodological issues. The app group reported greater use and slightly higher usability scores than the bot group, but the bot group demonstrated improved readiness to change scores over the app group. The strengths and limitations of the app and bot interventions as well as directions for future research are discussed. Trial Registration: ClinicalTrials.gov NCT04447794; https://clinicaltrials.gov/ct2/show/NCT04447794 UR - https://formative.jmir.org/2022/5/e33037 UR - http://dx.doi.org/10.2196/33037 UR - http://www.ncbi.nlm.nih.gov/pubmed/35576569 ID - info:doi/10.2196/33037 ER - TY - JOUR AU - Gudala, Meghana AU - Ross, Trail Mary Ellen AU - Mogalla, Sunitha AU - Lyons, Mandi AU - Ramaswamy, Padmavathy AU - Roberts, Kirk PY - 2022/4/28 TI - Benefits of, Barriers to, and Needs for an Artificial Intelligence?Powered Medication Information Voice Chatbot for Older Adults: Interview Study With Geriatrics Experts JO - JMIR Aging SP - e32169 VL - 5 IS - 2 KW - medication information KW - chatbot KW - older adults KW - technology capabilities KW - mobile phone N2 - Background: One of the most complicated medical needs of older adults is managing their complex medication regimens. However, the use of technology to aid older adults in this endeavor is impeded by the fact that their technological capabilities are lower than those of much of the rest of the population. What is needed to help manage medications is a technology that seamlessly integrates within their comfort levels, such as artificial intelligence agents. Objective: This study aimed to assess the benefits, barriers, and information needs that can be provided by an artificial intelligence?powered medication information voice chatbot for older adults. Methods: A total of 8 semistructured interviews were conducted with geriatrics experts. All interviews were audio-recorded and transcribed. Each interview was coded by 2 investigators (2 among ML, PR, METR, and KR) using a semiopen coding method for qualitative analysis, and reconciliation was performed by a third investigator. All codes were organized into the benefit/nonbenefit, barrier/nonbarrier, and need categories. Iterative recoding and member checking were performed until convergence was reached for all interviews. Results: The greatest benefits of a medication information voice-based chatbot would be helping to overcome the vision and dexterity hurdles experienced by most older adults, as it uses voice-based technology. It also helps to increase older adults? medication knowledge and adherence and supports their overall health. The main barriers were technology familiarity and cost, especially in lower socioeconomic older adults, as well as security and privacy concerns. It was noted however that technology familiarity was not an insurmountable barrier for older adults aged 65 to 75 years, who mostly owned smartphones, whereas older adults aged >75 years may have never been major users of technology in the first place. The most important needs were to be usable, to help patients with reminders, and to provide information on medication side effects and use instructions. Conclusions: Our needs analysis results derived from expert interviews clarify that a voice-based chatbot could be beneficial in improving adherence and overall health if it is built to serve the many medication information needs of older adults, such as reminders and instructions. However, the chatbot must be usable and affordable for its widespread use. UR - https://aging.jmir.org/2022/2/e32169 UR - http://dx.doi.org/10.2196/32169 UR - http://www.ncbi.nlm.nih.gov/pubmed/35482367 ID - info:doi/10.2196/32169 ER - TY - JOUR AU - Nißen, Marcia AU - Rüegger, Dominik AU - Stieger, Mirjam AU - Flückiger, Christoph AU - Allemand, Mathias AU - v Wangenheim, Florian AU - Kowatsch, Tobias PY - 2022/4/27 TI - The Effects of Health Care Chatbot Personas With Different Social Roles on the Client-Chatbot Bond and Usage Intentions: Development of a Design Codebook and Web-Based Study JO - J Med Internet Res SP - e32630 VL - 24 IS - 4 KW - chatbot KW - conversational agent KW - social roles KW - interpersonal closeness KW - social role theory KW - working alliance KW - design KW - persona KW - digital health intervention KW - web-based experiment N2 - Background: The working alliance refers to an important relationship quality between health professionals and clients that robustly links to treatment success. Recent research shows that clients can develop an affective bond with chatbots. However, few research studies have investigated whether this perceived relationship is affected by the social roles of differing closeness a chatbot can impersonate and by allowing users to choose the social role of a chatbot. Objective: This study aimed at understanding how the social role of a chatbot can be expressed using a set of interpersonal closeness cues and examining how these social roles affect clients? experiences and the development of an affective bond with the chatbot, depending on clients? characteristics (ie, age and gender) and whether they can freely choose a chatbot?s social role. Methods: Informed by the social role theory and the social response theory, we developed a design codebook for chatbots with different social roles along an interpersonal closeness continuum. Based on this codebook, we manipulated a fictitious health care chatbot to impersonate one of four distinct social roles common in health care settings?institution, expert, peer, and dialogical self?and examined effects on perceived affective bond and usage intentions in a web-based lab study. The study included a total of 251 participants, whose mean age was 41.15 (SD 13.87) years; 57.0% (143/251) of the participants were female. Participants were either randomly assigned to one of the chatbot conditions (no choice: n=202, 80.5%) or could freely choose to interact with one of these chatbot personas (free choice: n=49, 19.5%). Separate multivariate analyses of variance were performed to analyze differences (1) between the chatbot personas within the no-choice group and (2) between the no-choice and the free-choice groups. Results: While the main effect of the chatbot persona on affective bond and usage intentions was insignificant (P=.87), we found differences based on participants? demographic profiles: main effects for gender (P=.04, ?p2=0.115) and age (P<.001, ?p2=0.192) and a significant interaction effect of persona and age (P=.01, ?p2=0.102). Participants younger than 40 years reported higher scores for affective bond and usage intentions for the interpersonally more distant expert and institution chatbots; participants 40 years or older reported higher outcomes for the closer peer and dialogical-self chatbots. The option to freely choose a persona significantly benefited perceptions of the peer chatbot further (eg, free-choice group affective bond: mean 5.28, SD 0.89; no-choice group affective bond: mean 4.54, SD 1.10; P=.003, ?p2=0.117). Conclusions: Manipulating a chatbot?s social role is a possible avenue for health care chatbot designers to tailor clients? chatbot experiences using user-specific demographic factors and to improve clients? perceptions and behavioral intentions toward the chatbot. Our results also emphasize the benefits of letting clients freely choose between chatbots. UR - https://www.jmir.org/2022/4/e32630 UR - http://dx.doi.org/10.2196/32630 UR - http://www.ncbi.nlm.nih.gov/pubmed/35475761 ID - info:doi/10.2196/32630 ER - TY - JOUR AU - Boumans, Roel AU - van de Sande, Yana AU - Thill, Serge AU - Bosse, Tibor PY - 2022/4/25 TI - Voice-Enabled Intelligent Virtual Agents for People With Amnesia: Systematic Review JO - JMIR Aging SP - e32473 VL - 5 IS - 2 KW - intelligent virtual agent KW - amnesia KW - dementia KW - Alzheimer KW - systematic review KW - mobile phone N2 - Background: Older adults often have increasing memory problems (amnesia), and approximately 50 million people worldwide have dementia. This syndrome gradually affects a patient over a period of 10-20 years. Intelligent virtual agents may support people with amnesia. Objective: This study aims to identify state-of-the-art experimental studies with virtual agents on a screen capable of verbal dialogues with a target group of older adults with amnesia. Methods: We conducted a systematic search of PubMed, SCOPUS, Microsoft Academic, Google Scholar, Web of Science, and CrossRef on virtual agent and amnesia on papers that describe such experiments. Search criteria were (Virtual Agent OR Virtual Assistant OR Virtual Human OR Conversational Agent OR Virtual Coach OR Chatbot) AND (Amnesia OR Dementia OR Alzheimer OR Mild Cognitive Impairment). Risk of bias was evaluated using the QualSyst tool (University of Alberta), which scores 14 study quality items. Eligible studies are reported in a table including country, study design type, target sample size, controls, study aims, experiment population, intervention details, results, and an image of the agent. Results: A total of 8 studies was included in this meta-analysis. The average number of participants in the studies was 20 (SD 12). The verbal interactions were generally short. The usability was generally reported to be positive. The human utterance was seen in 7 (88%) out of 8 studies based on short words or phrases that were predefined in the agent?s speech recognition algorithm. The average study quality score was 0.69 (SD 0.08) on a scale of 0 to 1. Conclusions: The number of experimental studies on talking about virtual agents that support people with memory problems is still small. The details on the verbal interaction are limited, which makes it difficult to assess the quality of the interaction and the possible effects of confounding parameters. In addition, the derivation of the aggregated data was difficult. Further research with extended and prolonged dialogues is required. UR - https://aging.jmir.org/2022/2/e32473 UR - http://dx.doi.org/10.2196/32473 UR - http://www.ncbi.nlm.nih.gov/pubmed/35468084 ID - info:doi/10.2196/32473 ER - TY - JOUR AU - Powell, Leigh AU - Nizam, Zayan Mohammed AU - Nour, Radwa AU - Zidoun, Youness AU - Sleibi, Randa AU - Kaladhara Warrier, Sreelekshmi AU - Al Suwaidi, Hanan AU - Zary, Nabil PY - 2022/4/19 TI - Conversational Agents in Health Education: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e31923 VL - 11 IS - 4 KW - conversational agents KW - artificial intelligence chatbots KW - chatbots KW - health education KW - health promotion KW - classification KW - artificial intelligence assistants KW - conversational artificial intelligence N2 - Background: Conversational agents have the ability to reach people through multiple mediums, including the online space, mobile phones, and hardware devices like Alexa and Google Home. Conversational agents provide an engaging method of interaction while making information easier to access. Their emergence into areas related to public health and health education is perhaps unsurprising. While the building of conversational agents is getting more simplified with time, there are still requirements of time and effort. There is also a lack of clarity and consistent terminology regarding what constitutes a conversational agent, how these agents are developed, and the kinds of resources that are needed to develop and sustain them. This lack of clarity creates a daunting task for those seeking to build conversational agents for health education initiatives. Objective: This scoping review aims to identify literature that reports on the design and implementation of conversational agents to promote and educate the public on matters related to health. We will categorize conversational agents in health education in alignment with current classifications and terminology emerging from the marketplace. We will clearly define the variety levels of conversational agents, categorize currently existing agents within these levels, and describe the development models, tools, and resources being used to build conversational agents for health care education purposes. Methods: This scoping review will be conducted by employing the Arksey and O?Malley framework. We will also be adhering to the enhancements and updates proposed by Levac et al and Peters et al. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension for scoping reviews will guide the reporting of this scoping review. A systematic search for published and grey literature will be undertaken from the following databases: (1) PubMed, (2) PsychINFO, (3) Embase, (4) Web of Science, (5) SCOPUS, (6) CINAHL, (7) ERIC, (8) MEDLINE, and (9) Google Scholar. Data charting will be done using a structured format. Results: Initial searches of the databases retrieved 1305 results. The results will be presented in the final scoping review in a narrative and illustrative manner. Conclusions: This scoping review will report on conversational agents being used in health education today, and will include categorization of the levels of the agents and report on the kinds of tools, resources, and design and development methods used. International Registered Report Identifier (IRRID): DERR1-10.2196/31923 UR - https://www.researchprotocols.org/2022/4/e31923 UR - http://dx.doi.org/10.2196/31923 UR - http://www.ncbi.nlm.nih.gov/pubmed/35258006 ID - info:doi/10.2196/31923 ER - TY - JOUR AU - Sagstad, Haaland Mari AU - Morken, Nils-Halvdan AU - Lund, Agnethe AU - Dingsør, Jannike Linn AU - Nilsen, Vika Anne Britt AU - Sorbye, Marie Linn PY - 2022/4/18 TI - Quantitative User Data From a Chatbot Developed for Women With Gestational Diabetes Mellitus: Observational Study JO - JMIR Form Res SP - e28091 VL - 6 IS - 4 KW - chatbot KW - gestational diabetes mellitus KW - user data KW - log review KW - eHealth KW - diabetes KW - pregnancy KW - dialogue N2 - Background: The rising prevalence of gestational diabetes mellitus (GDM) calls for the use of innovative methods to inform and empower these pregnant women. An information chatbot, Dina, was developed for women with GDM and is Norway?s first health chatbot, integrated into the national digital health platform. Objective: The aim of this study is to investigate what kind of information users seek in a health chatbot providing support on GDM. Furthermore, we sought to explore when and how the chatbot is used by time of day and the number of questions in each dialogue and to categorize the questions the chatbot was unable to answer (fallback). The overall goal is to explore quantitative user data in the chatbot?s log, thereby contributing to further development of the chatbot. Methods: An observational study was designed. We used quantitative anonymous data (dialogues) from the chatbot?s log and platform during an 8-week period in 2018 and a 12-week period in 2019 and 2020. Dialogues between the user and the chatbot were the unit of analysis. Questions from the users were categorized by theme. The time of day the dialogue occurred and the number of questions in each dialogue were registered, and questions resulting in a fallback message were identified. Results are presented using descriptive statistics. Results: We identified 610 dialogues with a total of 2838 questions during the 20 weeks of data collection. Questions regarding blood glucose, GDM, diet, and physical activity represented 58.81% (1669/2838) of all questions. In total, 58.0% (354/610) of dialogues occurred during daytime (8 AM to 3:59 PM), Monday through Friday. Most dialogues were short, containing 1-3 questions (340/610, 55.7%), and there was a decrease in dialogues containing 4-6 questions in the second period (P=.013). The chatbot was able to answer 88.51% (2512/2838) of all posed questions. The mean number of dialogues per week was 36 in the first period and 26.83 in the second period. Conclusions: Frequently asked questions seem to mirror the cornerstones of GDM treatment and may indicate that the chatbot is used to quickly access information already provided for them by the health care service but providing a low-threshold way to access that information. Our results underline the need to actively promote and integrate the chatbot into antenatal care as well as the importance of continuous content improvement in order to provide relevant information. UR - https://formative.jmir.org/2022/4/e28091 UR - http://dx.doi.org/10.2196/28091 UR - http://www.ncbi.nlm.nih.gov/pubmed/35436213 ID - info:doi/10.2196/28091 ER - TY - JOUR AU - Chew, Jocelyn Han Shi PY - 2022/4/13 TI - The Use of Artificial Intelligence?Based Conversational Agents (Chatbots) for Weight Loss: Scoping Review and Practical Recommendations JO - JMIR Med Inform SP - e32578 VL - 10 IS - 4 KW - chatbot KW - conversational agent KW - artificial intelligence KW - weight loss KW - obesity KW - overweight KW - natural language processing KW - sentiment analysis KW - machine learning KW - behavior change KW - mobile phone N2 - Background: Overweight and obesity have now reached a state of a pandemic despite the clinical and commercial programs available. Artificial intelligence (AI) chatbots have a strong potential in optimizing such programs for weight loss. Objective: This study aimed to review AI chatbot use cases for weight loss and to identify the essential components for prolonging user engagement. Methods: A scoping review was conducted using the 5-stage framework by Arksey and O?Malley. Articles were searched across nine electronic databases (ACM Digital Library, CINAHL, Cochrane Central, Embase, IEEE Xplore, PsycINFO, PubMed, Scopus, and Web of Science) until July 9, 2021. Gray literature, reference lists, and Google Scholar were also searched. Results: A total of 23 studies with 2231 participants were included and evaluated in this review. Most studies (8/23, 35%) focused on using AI chatbots to promote both a healthy diet and exercise, 13% (3/23) of the studies used AI chatbots solely for lifestyle data collection and obesity risk assessment whereas only 4% (1/23) of the studies focused on promoting a combination of a healthy diet, exercise, and stress management. In total, 48% (11/23) of the studies used only text-based AI chatbots, 52% (12/23) operationalized AI chatbots through smartphones, and 39% (9/23) integrated data collected through fitness wearables or Internet of Things appliances. The core functions of AI chatbots were to provide personalized recommendations (20/23, 87%), motivational messages (18/23, 78%), gamification (6/23, 26%), and emotional support (6/23, 26%). Study participants who experienced speech- and augmented reality?based chatbot interactions in addition to text-based chatbot interactions reported higher user engagement because of the convenience of hands-free interactions. Enabling conversations through multiple platforms (eg, SMS text messaging, Slack, Telegram, Signal, WhatsApp, or Facebook Messenger) and devices (eg, laptops, Google Home, and Amazon Alexa) was reported to increase user engagement. The human semblance of chatbots through verbal and nonverbal cues improved user engagement through interactivity and empathy. Other techniques used in text-based chatbots included personally and culturally appropriate colloquial tones and content; emojis that emulate human emotional expressions; positively framed words; citations of credible information sources; personification; validation; and the provision of real-time, fast, and reliable recommendations. Prevailing issues included privacy; accountability; user burden; and interoperability with other databases, third-party applications, social media platforms, devices, and appliances. Conclusions: AI chatbots should be designed to be human-like, personalized, contextualized, immersive, and enjoyable to enhance user experience, engagement, behavior change, and weight loss. These require the integration of health metrics (eg, based on self-reports and wearable trackers), personality and preferences (eg, based on goal achievements), circumstantial behaviors (eg, trigger-based overconsumption), and emotional states (eg, chatbot conversations and wearable stress detectors) to deliver personalized and effective recommendations for weight loss. UR - https://medinform.jmir.org/2022/4/e32578 UR - http://dx.doi.org/10.2196/32578 UR - http://www.ncbi.nlm.nih.gov/pubmed/35416791 ID - info:doi/10.2196/32578 ER - TY - JOUR AU - Laranjo, Liliana AU - Shaw, Tim AU - Trivedi, Ritu AU - Thomas, Stuart AU - Charlston, Emma AU - Klimis, Harry AU - Thiagalingam, Aravinda AU - Kumar, Saurabh AU - Tan, C. Timothy AU - Nguyen, N. Tu AU - Marschner, Simone AU - Chow, Clara PY - 2022/4/13 TI - Coordinating Health Care With Artificial Intelligence?Supported Technology for Patients With Atrial Fibrillation: Protocol for a Randomized Controlled Trial JO - JMIR Res Protoc SP - e34470 VL - 11 IS - 4 KW - atrial fibrillation KW - interactive voice response KW - artificial intelligence KW - conversational agent KW - mobile phone N2 - Background: Atrial fibrillation (AF) is an increasingly common chronic health condition for which integrated care that is multidisciplinary and patient-centric is recommended yet challenging to implement. Objective: The aim of Coordinating Health Care With Artificial Intelligence?Supported Technology in AF is to evaluate the feasibility and potential efficacy of a digital intervention (AF-Support) comprising preprogrammed automated telephone calls (artificial intelligence conversational technology), SMS text messages, and emails, as well as an educational website, to support patients with AF in self-managing their condition and coordinate primary and secondary care follow-up. Methods: Coordinating Health Care With Artificial Intelligence?Supported Technology in AF is a 6-month randomized controlled trial of adult patients with AF (n=385), who will be allocated in a ratio of 4:1 to AF-Support or usual care, with postintervention semistructured interviews. The primary outcome is AF-related quality of life, and the secondary outcomes include cardiovascular risk factors, outcomes, and health care use. The 4:1 allocation design enables a detailed examination of the feasibility, uptake, and process of the implementation of AF-Support. Participants with new or ongoing AF will be recruited from hospitals and specialist-led clinics in Sydney, New South Wales, Australia. AF-Support has been co-designed with clinicians, researchers, information technologists, and patients. Automated telephone calls will occur 7 times, with the first call triggered to commence 24 to 48 hours after enrollment. Calls follow a standard flow but are customized to vary depending on patients? responses. Calls assess AF symptoms, and participants? responses will trigger different system responses based on prespecified protocols, including the identification of red flags requiring escalation. Randomization will be performed electronically, and allocation concealment will be ensured. Because of the nature of this trial, only outcome assessors and data analysts will be blinded. For the primary outcome, groups will be compared using an analysis of covariance adjusted for corresponding baseline values. Randomized trial data analysis will be performed according to the intention-to-treat principle, and qualitative data will be thematically analyzed. Results: Ethics approval was granted by the Western Sydney Local Health District Human Ethics Research Committee, and recruitment started in December 2020. As of December 2021, a total of 103 patients had been recruited. Conclusions: This study will address the gap in knowledge with respect to the role of postdischarge digital care models for supporting patients with AF. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12621000174886; https://www.australianclinicaltrials.gov.au/anzctr/trial/ACTRN12621000174886 International Registered Report Identifier (IRRID): DERR1-10.2196/34470 UR - https://www.researchprotocols.org/2022/4/e34470 UR - http://dx.doi.org/10.2196/34470 UR - http://www.ncbi.nlm.nih.gov/pubmed/35416784 ID - info:doi/10.2196/34470 ER - TY - JOUR AU - Gupta, Megha AU - Malik, Tanya AU - Sinha, Chaitali PY - 2022/3/31 TI - Delivery of a Mental Health Intervention for Chronic Pain Through an Artificial Intelligence?Enabled App (Wysa): Protocol for a Prospective Pilot Study JO - JMIR Res Protoc SP - e36910 VL - 11 IS - 3 KW - chronic pain KW - AI-enabled mental health assistant KW - digital health intervention KW - mental health conversational agent KW - artificial intelligence KW - depression KW - mental health KW - anxiety KW - health care cost KW - conversational agent KW - chatbot KW - digital health N2 - Background: Patients with chronic pain often experience coexisting, long-term and debilitating mental health comorbidities such as depression and anxiety. Artificial intelligence?supported cognitive behavioral therapy (AI-CBT) interventions could offer cost-effective, accessible, and potentially effective resources to address this problem. However, there is not enough research conducted about the efficacy of AI-CBT interventions for chronic pain. Objective: This prospective cohort study aims to examine the efficacy and use of an AI-CBT intervention for chronic pain (Wysa for Chronic Pain app, Wysa Inc) using a conversational agent (with no human intervention). To the best of our knowledge, this is the first such study for chronic pain using a fully-automated, free-text?based conversational agent. Methods: Participants with self-reported chronic pain (n=500) will be recruited online on a rolling basis from April 2022 through posts on US-based internet communities within this prospective cohort. Informed consent is received from participants within the app, and the Wysa for Chronic Pain intervention is delivered remotely for 8 weeks. Outcome measures including a numeric pain rating scale and Patient-Reported Outcomes Measurement Information System?Pain Interference, Generalized Anxiety Disorder?7, and Patient Health Questionnaire?9 questionnaires administered to test the effectiveness of the intervention on reducing levels of pain interference, depression, and anxiety. The therapeutic alliance created with the conversational agent will be assessed through the Working Alliance Inventory?Short Revised instrument. Retention and use statistics will be observed for adherence and engagement. Results: The study will open for recruitment in April 2022, and data collection is expected to be completed by August 2022. The results for the primary outcomes are expected to be published by late 2022. Conclusions: Mental health conversational agents driven by artificial intelligence could be effective in helping patients with chronic pain learn to self-manage their pain and common comorbidities like depression and anxiety. The Wysa for Chronic Pain app is one such digital intervention that can potentially serve as a solution to the problems of affordability and scalability associated with interventions that include a human therapist. This prospective study examines the efficacy of the app as a treatment solution for chronic pain. It aims to inform future practices and digital mental health interventions for individuals with chronic pain. International Registered Report Identifier (IRRID): PRR1-10.2196/36910 UR - https://www.researchprotocols.org/2022/3/e36910 UR - http://dx.doi.org/10.2196/36910 UR - http://www.ncbi.nlm.nih.gov/pubmed/35314423 ID - info:doi/10.2196/36910 ER - TY - JOUR AU - Tanaka, Hiroki AU - Nakamura, Satoshi PY - 2022/3/29 TI - The Acceptability of Virtual Characters as Social Skills Trainers: Usability Study JO - JMIR Hum Factors SP - e35358 VL - 9 IS - 1 KW - social skills training KW - virtual agent design KW - virtual assistant KW - virtual trainer KW - chatbot KW - acceptability KW - realism KW - virtual agent KW - simulation KW - social skill KW - social interaction KW - design KW - training KW - crowdsourcing N2 - Background: Social skills training by human trainers is a well-established method to provide appropriate social interaction skills and strengthen social self-efficacy. In our previous work, we attempted to automate social skills training by developing a virtual agent that taught social skills through interaction. Previous research has not investigated the visual design of virtual agents for social skills training. Thus, we investigated the effect of virtual agent visual design on automated social skills training. Objective: The 3 main purposes of this research were to investigate the effect of virtual agent appearance on automated social skills training, the relationship between acceptability and other measures (eg, likeability, realism, and familiarity), and the relationship between likeability and individual user characteristics (eg, gender, age, and autistic traits). Methods: We prepared images and videos of a virtual agent, and 1218 crowdsourced workers rated the virtual agents through a questionnaire. In designing personalized virtual agents, we investigated the acceptability, likeability, and other impressions of the virtual agents and their relationship to individual characteristics. Results: We found that there were differences between the virtual agents in all measures (P<.001). A female anime-type virtual agent was rated as the most likeable. We also confirmed that participants? gender, age, and autistic traits were related to their ratings. Conclusions: We confirmed the effect of virtual agent design on automated social skills training. Our findings are important in designing the appearance of an agent for use in personalized automated social skills training. UR - https://humanfactors.jmir.org/2022/1/e35358 UR - http://dx.doi.org/10.2196/35358 UR - http://www.ncbi.nlm.nih.gov/pubmed/35348468 ID - info:doi/10.2196/35358 ER - TY - JOUR AU - Ahmed, Arfan AU - Aziz, Sarah AU - Khalifa, Mohamed AU - Shah, Uzair AU - Hassan, Asma AU - Abd-Alrazaq, Alaa AU - Househ, Mowafa PY - 2022/3/11 TI - Thematic Analysis on User Reviews for Depression and Anxiety Chatbot Apps: Machine Learning Approach JO - JMIR Form Res SP - e27654 VL - 6 IS - 3 KW - anxiety KW - depression KW - chatbots KW - conversational agents KW - topic modeling KW - latent Dirichlet allocation KW - thematic analysis KW - mobile phone N2 - Background: Anxiety and depression are among the most commonly prevalent mental health disorders worldwide. Chatbot apps can play an important role in relieving anxiety and depression. Users? reviews of chatbot apps are considered an important source of data for exploring users? opinions and satisfaction. Objective: This study aims to explore users? opinions, satisfaction, and attitudes toward anxiety and depression chatbot apps by conducting a thematic analysis of users? reviews of 11 anxiety and depression chatbot apps collected from the Google Play Store and Apple App Store. In addition, we propose a workflow to provide a methodological approach for future analysis of app review comments. Methods: We analyzed 205,581 user review comments from chatbots designed for users with anxiety and depression symptoms. Using scraper tools and Google Play Scraper and App Store Scraper Python libraries, we extracted the text and metadata. The reviews were divided into positive and negative meta-themes based on users? rating per review. We analyzed the reviews using word frequencies of bigrams and words in pairs. A topic modeling technique, latent Dirichlet allocation, was applied to identify topics in the reviews and analyzed to detect themes and subthemes. Results: Thematic analysis was conducted on 5 topics for each sentimental set. Reviews were categorized as positive or negative. For positive reviews, the main themes were confidence and affirmation building, adequate analysis, and consultation, caring as a friend, and ease of use. For negative reviews, the results revealed the following themes: usability issues, update issues, privacy, and noncreative conversations. Conclusions: Using a machine learning approach, we were able to analyze ?200,000 comments and categorize them into themes, allowing us to observe users? expectations effectively despite some negative factors. A methodological workflow is provided for the future analysis of review comments. UR - https://formative.jmir.org/2022/3/e27654 UR - http://dx.doi.org/10.2196/27654 UR - http://www.ncbi.nlm.nih.gov/pubmed/35275069 ID - info:doi/10.2196/27654 ER - TY - JOUR AU - Chan, W. William AU - Fitzsimmons-Craft, E. Ellen AU - Smith, C. Arielle AU - Firebaugh, Marie-Laure AU - Fowler, A. Lauren AU - DePietro, Bianca AU - Topooco, Naira AU - Wilfley, E. Denise AU - Taylor, Barr C. AU - Jacobson, C. Nicholas PY - 2022/1/19 TI - The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study JO - JMIR Form Res SP - e28003 VL - 6 IS - 1 KW - chatbot KW - eating disorders KW - digital mental health KW - prevention KW - intervention development N2 - Background: Chatbots have the potential to provide cost-effective mental health prevention programs at scale and increase interactivity, ease of use, and accessibility of intervention programs. Objective: The development of chatbot prevention for eating disorders (EDs) is still in its infancy. Our aim is to present examples of and solutions to challenges in designing and refining a rule-based prevention chatbot program for EDs, targeted at adult women at risk for developing an ED. Methods: Participants were 2409 individuals who at least began to use an EDs prevention chatbot in response to social media advertising. Over 6 months, the research team reviewed up to 52,129 comments from these users to identify inappropriate responses that negatively impacted users? experience and technical glitches. Problems identified by reviewers were then presented to the entire research team, who then generated possible solutions and implemented new responses. Results: The most common problem with the chatbot was a general limitation in understanding and responding appropriately to unanticipated user responses. We developed several workarounds to limit these problems while retaining some interactivity. Conclusions: Rule-based chatbots have the potential to reach large populations at low cost but are limited in understanding and responding appropriately to unanticipated user responses. They can be most effective in providing information and simple conversations. Workarounds can reduce conversation errors. UR - https://formative.jmir.org/2022/1/e28003 UR - http://dx.doi.org/10.2196/28003 UR - http://www.ncbi.nlm.nih.gov/pubmed/35044314 ID - info:doi/10.2196/28003 ER - TY - JOUR AU - Wang, Hua AU - Gupta, Sneha AU - Singhal, Arvind AU - Muttreja, Poonam AU - Singh, Sanghamitra AU - Sharma, Poorva AU - Piterova, Alice PY - 2022/1/3 TI - An Artificial Intelligence Chatbot for Young People?s Sexual and Reproductive Health in India (SnehAI): Instrumental Case Study JO - J Med Internet Res SP - e29969 VL - 24 IS - 1 KW - artificial intelligence KW - chatbot KW - Facebook KW - affordance KW - sex education KW - sexual and reproductive health KW - contraception KW - case study KW - young people KW - India KW - transmedia KW - mobile apps KW - mobile health KW - technology design KW - user engagement KW - digital health KW - mobile phone N2 - Background: Leveraging artificial intelligence (AI)?driven apps for health education and promotion can help in the accomplishment of several United Nations sustainable development goals. SnehAI, developed by the Population Foundation of India, is the first Hinglish (Hindi + English) AI chatbot, deliberately designed for social and behavioral changes in India. It provides a private, nonjudgmental, and safe space to spur conversations about taboo topics (such as safe sex and family planning) and offers accurate, relatable, and trustworthy information and resources. Objective: This study aims to use the Gibson theory of affordances to examine SnehAI and offer scholarly guidance on how AI chatbots can be used to educate adolescents and young adults, promote sexual and reproductive health, and advocate for the health entitlements of women and girls in India. Methods: We adopted an instrumental case study approach that allowed us to explore SnehAI from the perspectives of technology design, program implementation, and user engagement. We also used a mix of qualitative insights and quantitative analytics data to triangulate our findings. Results: SnehAI demonstrated strong evidence across fifteen functional affordances: accessibility, multimodality, nonlinearity, compellability, queriosity, editability, visibility, interactivity, customizability, trackability, scalability, glocalizability, inclusivity, connectivity, and actionability. SnehAI also effectively engaged its users, especially young men, with 8.2 million messages exchanged across a 5-month period. Almost half of the incoming user messages were texts of deeply personal questions and concerns about sexual and reproductive health, as well as allied topics. Overall, SnehAI successfully presented itself as a trusted friend and mentor; the curated content was both entertaining and educational, and the natural language processing system worked effectively to personalize the chatbot response and optimize user experience. Conclusions: SnehAI represents an innovative, engaging, and educational intervention that enables vulnerable and hard-to-reach population groups to talk and learn about sensitive and important issues. SnehAI is a powerful testimonial of the vital potential that lies in AI technologies for social good. UR - https://www.jmir.org/2022/1/e29969 UR - http://dx.doi.org/10.2196/29969 UR - http://www.ncbi.nlm.nih.gov/pubmed/34982034 ID - info:doi/10.2196/29969 ER - TY - JOUR AU - Curtis, G. Rachel AU - Bartel, Bethany AU - Ferguson, Ty AU - Blake, T. Henry AU - Northcott, Celine AU - Virgara, Rosa AU - Maher, A. Carol PY - 2021/12/21 TI - Improving User Experience of Virtual Health Assistants: Scoping Review JO - J Med Internet Res SP - e31737 VL - 23 IS - 12 KW - virtual assistant KW - conversational agent KW - chatbot KW - eHealth KW - digital health KW - design KW - user experience KW - mobile phone N2 - Background: Virtual assistants can be used to deliver innovative health programs that provide appealing, personalized, and convenient health advice and support at scale and low cost. Design characteristics that influence the look and feel of the virtual assistant, such as visual appearance or language features, may significantly influence users? experience and engagement with the assistant. Objective: This scoping review aims to provide an overview of the experimental research examining how design characteristics of virtual health assistants affect user experience, summarize research findings of experimental research examining how design characteristics of virtual health assistants affect user experience, and provide recommendations for the design of virtual health assistants if sufficient evidence exists. Methods: We searched 5 electronic databases (Web of Science, MEDLINE, Embase, PsycINFO, and ACM Digital Library) to identify the studies that used an experimental design to compare the effects of design characteristics between 2 or more versions of an interactive virtual health assistant on user experience among adults. Data were synthesized descriptively. Health domains, design characteristics, and outcomes were categorized, and descriptive statistics were used to summarize the body of research. Results for each study were categorized as positive, negative, or no effect, and a matrix of the design characteristics and outcome categories was constructed to summarize the findings. Results: The database searches identified 6879 articles after the removal of duplicates. We included 48 articles representing 45 unique studies in the review. The most common health domains were mental health and physical activity. Studies most commonly examined design characteristics in the categories of visual design or conversational style and relational behavior and assessed outcomes in the categories of personality, satisfaction, relationship, or use intention. Over half of the design characteristics were examined by only 1 study. Results suggest that empathy and relational behavior and self-disclosure are related to more positive user experience. Results also suggest that if a human-like avatar is used, realistic rendering and medical attire may potentially be related to more positive user experience; however, more research is needed to confirm this. Conclusions: There is a growing body of scientific evidence examining the impact of virtual health assistants? design characteristics on user experience. Taken together, data suggest that the look and feel of a virtual health assistant does affect user experience. Virtual health assistants that show empathy, display nonverbal relational behaviors, and disclose personal information about themselves achieve better user experience. At present, the evidence base is broad, and the studies are typically small in scale and highly heterogeneous. Further research, particularly using longitudinal research designs with repeated user interactions, is needed to inform the optimal design of virtual health assistants. UR - https://www.jmir.org/2021/12/e31737 UR - http://dx.doi.org/10.2196/31737 UR - http://www.ncbi.nlm.nih.gov/pubmed/34931997 ID - info:doi/10.2196/31737 ER - TY - JOUR AU - Dhinagaran, Ardhithy Dhakshenya AU - Sathish, Thirunavukkarasu AU - Soong, AiJia AU - Theng, Yin-Leng AU - Best, James AU - Tudor Car, Lorainne PY - 2021/12/3 TI - Conversational Agent for Healthy Lifestyle Behavior Change: Web-Based Feasibility Study JO - JMIR Form Res SP - e27956 VL - 5 IS - 12 KW - chatbot KW - conversational agents KW - behavior change KW - healthy lifestyle behavior change KW - pilot study KW - feasibility trial KW - usability KW - acceptability KW - preliminary efficacy KW - mobile phone N2 - Background: The rising incidence of chronic diseases is a growing concern, especially in Singapore, which is one of the high-income countries with the highest prevalence of diabetes. Interventions that promote healthy lifestyle behavior changes have been proven to be effective in reducing the progression of prediabetes to diabetes, but their in-person delivery may not be feasible on a large scale. Novel technologies such as conversational agents are a potential alternative for delivering behavioral interventions that promote healthy lifestyle behavior changes to the public. Objective: The aim of this study is to assess the feasibility and acceptability of using a conversational agent promoting healthy lifestyle behavior changes in the general population in Singapore. Methods: We performed a web-based, single-arm feasibility study. The participants were recruited through Facebook over 4 weeks. The Facebook Messenger conversational agent was used to deliver the intervention. The conversations focused on diet, exercise, sleep, and stress and aimed to promote healthy lifestyle behavior changes and improve the participants? knowledge of diabetes. Messages were sent to the participants four times a week (once for each of the 4 topics of focus) for 4 weeks. We assessed the feasibility of recruitment, defined as at least 75% (150/200) of our target sample of 200 participants in 4 weeks, as well as retention, defined as 33% (66/200) of the recruited sample completing the study. We also assessed the participants? satisfaction with, and usability of, the conversational agent. In addition, we performed baseline and follow-up assessments of quality of life, diabetes knowledge and risk perception, diet, exercise, sleep, and stress. Results: We recruited 37.5% (75/200) of the target sample size in 1 month. Of the 75 eligible participants, 60 (80%) provided digital informed consent and completed baseline assessments. Of these 60 participants, 56 (93%) followed the study through till completion. Retention was high at 93% (56/60), along with engagement, denoted by 50% (30/60) of the participants communicating with the conversational agent at each interaction. Acceptability, usability, and satisfaction were generally high. Preliminary efficacy of the intervention showed no definitive improvements in health-related behavior. Conclusions: The delivery of a conversational agent for healthy lifestyle behavior change through Facebook Messenger was feasible and acceptable. We were unable to recruit our planned sample solely using the free options in Facebook. However, participant retention and conversational agent engagement rates were high. Our findings provide important insights to inform the design of a future randomized controlled trial. UR - https://formative.jmir.org/2021/12/e27956 UR - http://dx.doi.org/10.2196/27956 UR - http://www.ncbi.nlm.nih.gov/pubmed/34870611 ID - info:doi/10.2196/27956 ER - TY - JOUR AU - Petracca, Francesco AU - Tempre, Rosaria AU - Cucciniello, Maria AU - Ciani, Oriana AU - Pompeo, Elena AU - Sannino, Luigi AU - Lovato, Valeria AU - Castaman, Giancarlo AU - Ghirardini, Alessandra AU - Tarricone, Rosanna PY - 2021/12/1 TI - An Electronic Patient-Reported Outcome Mobile App for Data Collection in Type A Hemophilia: Design and Usability Study JO - JMIR Form Res SP - e25071 VL - 5 IS - 12 KW - mobile apps KW - mHealth KW - hemophilia A KW - rare diseases KW - usability KW - user-centered design KW - design science KW - mobile phone N2 - Background: There is currently limited evidence on the level and intensity of physical activity in individuals with hemophilia A. Mobile technologies can offer a rigorous and reliable alternative to support data collection processes but they are often associated with poor user retention. The lack of longitudinal continuity in their use can be partly attributed to the insufficient consideration of stakeholder inputs in the development process of mobile apps. Several user-centered models have been proposed to guarantee that a thorough knowledge of the end user needs is considered in the development process of mobile apps. Objective: The aim of this study is to design and validate an electronic patient-reported outcome mobile app that requires sustained active input by individuals during POWER, an observational study that aims at evaluating the relationship between physical activity levels and bleeding in patients with hemophilia A. Methods: We adopted a user-centered design and engaged several stakeholders in the development and usability testing of this mobile app. During the concept generation and ideation phase, we organized a need-assessment focus group (FG) with patient representatives to elicit specific design requirements for the end users. We then conducted 2 exploratory FGs to seek additional inputs for the app?s improvement and 2 confirmatory FGs to validate the app and test its usability in the field through the mobile health app usability questionnaire. Results: The findings from the thematic analysis of the need-assessment FG revealed that there was a demand for sense making, for simplification of app functionalities, for maximizing integration, and for minimizing the feeling of external control. Participants involved in the later stages of the design refinement contributed to improving the design further by upgrading the app?s layout and making the experience with the app more efficient through functions such as chatbots and visual feedback on the number of hours a wearable device had been worn, to ensure that the observed data were actually registered. The end users rated the app highly during the quantitative assessment, with an average mobile health app usability questionnaire score of 5.32 (SD 0.66; range 4.44-6.23) and 6.20 (SD 0.43; range 5.72-6.88) out of 7 in the 2 iterative usability testing cycles. Conclusions: The results of the usability test indicated a high, growing satisfaction with the electronic patient-reported outcome app. The adoption of a thorough user-centered design process using several types of FGs helped maximize the likelihood of sustained retention of the app?s users and made it fit for data collection of relevant outcomes in the observational POWER study. The continuous use of the app and the actual level of engagement will be evaluated during the ongoing trial. Trial Registration: ClinicalTrials.gov NCT04165135; https://clinicaltrials.gov/ct2/show/NCT04165135 UR - https://formative.jmir.org/2021/12/e25071 UR - http://dx.doi.org/10.2196/25071 UR - http://www.ncbi.nlm.nih.gov/pubmed/34855619 ID - info:doi/10.2196/25071 ER - TY - JOUR AU - Xu, Lu AU - Sanders, Leslie AU - Li, Kay AU - Chow, L. James C. PY - 2021/11/29 TI - Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review JO - JMIR Cancer SP - e27850 VL - 7 IS - 4 KW - chatbot KW - artificial intelligence KW - machine learning KW - health KW - medicine KW - communication KW - diagnosis KW - cancer therapy KW - ethics KW - medical biophysics KW - mobile phone N2 - Background: Chatbot is a timely topic applied in various fields, including medicine and health care, for human-like knowledge transfer and communication. Machine learning, a subset of artificial intelligence, has been proven particularly applicable in health care, with the ability for complex dialog management and conversational flexibility. Objective: This review article aims to report on the recent advances and current trends in chatbot technology in medicine. A brief historical overview, along with the developmental progress and design characteristics, is first introduced. The focus will be on cancer therapy, with in-depth discussions and examples of diagnosis, treatment, monitoring, patient support, workflow efficiency, and health promotion. In addition, this paper will explore the limitations and areas of concern, highlighting ethical, moral, security, technical, and regulatory standards and evaluation issues to explain the hesitancy in implementation. Methods: A search of the literature published in the past 20 years was conducted using the IEEE Xplore, PubMed, Web of Science, Scopus, and OVID databases. The screening of chatbots was guided by the open-access Botlist directory for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion. Results: Even after addressing these issues and establishing the safety or efficacy of chatbots, human elements in health care will not be replaceable. Therefore, chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes. Other applications in pandemic support, global health, and education are yet to be fully explored. Conclusions: Further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine. UR - https://cancer.jmir.org/2021/4/e27850 UR - http://dx.doi.org/10.2196/27850 UR - http://www.ncbi.nlm.nih.gov/pubmed/34847056 ID - info:doi/10.2196/27850 ER - TY - JOUR AU - To, G. Quyen AU - Green, Chelsea AU - Vandelanotte, Corneel PY - 2021/11/26 TI - Feasibility, Usability, and Effectiveness of a Machine Learning?Based Physical Activity Chatbot: Quasi-Experimental Study JO - JMIR Mhealth Uhealth SP - e28577 VL - 9 IS - 11 KW - conversational agent KW - virtual coach KW - intervention KW - exercise KW - acceptability KW - mobile phone N2 - Background: Behavioral eHealth and mobile health interventions have been moderately successful in increasing physical activity, although opportunities for further improvement remain to be discussed. Chatbots equipped with natural language processing can interact and engage with users and help continuously monitor physical activity by using data from wearable sensors and smartphones. However, a limited number of studies have evaluated the effectiveness of chatbot interventions on physical activity. Objective: This study aims to investigate the feasibility, usability, and effectiveness of a machine learning?based physical activity chatbot. Methods: A quasi-experimental design without a control group was conducted with outcomes evaluated at baseline and 6 weeks. Participants wore a Fitbit Flex 1 (Fitbit LLC) and connected to the chatbot via the Messenger app. The chatbot provided daily updates on the physical activity level for self-monitoring, sent out daily motivational messages in relation to goal achievement, and automatically adjusted the daily goals based on physical activity levels in the last 7 days. When requested by the participants, the chatbot also provided sources of information on the benefits of physical activity, sent general motivational messages, and checked participants? activity history (ie, the step counts/min that were achieved on any day). Information about usability and acceptability was self-reported. The main outcomes were daily step counts recorded by the Fitbit and self-reported physical activity. Results: Among 116 participants, 95 (81.9%) were female, 85 (73.3%) were in a relationship, 101 (87.1%) were White, and 82 (70.7%) were full-time workers. Their average age was 49.1 (SD 9.3) years with an average BMI of 32.5 (SD 8.0) kg/m2. Most experienced technical issues were due to an unexpected change in Facebook policy (93/113, 82.3%). Most of the participants scored the usability of the chatbot (101/113, 89.4%) and the Fitbit (99/113, 87.6%) as at least ?OK.? About one-third (40/113, 35.4%) would continue to use the chatbot in the future, and 53.1% (60/113) agreed that the chatbot helped them become more active. On average, 6.7 (SD 7.0) messages/week were sent to the chatbot and 5.1 (SD 7.4) min/day were spent using the chatbot. At follow-up, participants recorded more steps (increase of 627, 95% CI 219-1035 steps/day) and total physical activity (increase of 154.2 min/week; 3.58 times higher at follow-up; 95% CI 2.28-5.63). Participants were also more likely to meet the physical activity guidelines (odds ratio 6.37, 95% CI 3.31-12.27) at follow-up. Conclusions: The machine learning?based physical activity chatbot was able to significantly increase participants? physical activity and was moderately accepted by the participants. However, the Facebook policy change undermined the chatbot functionality and indicated the need to use independent platforms for chatbot deployment to ensure successful delivery of this type of intervention. UR - https://mhealth.jmir.org/2021/11/e28577 UR - http://dx.doi.org/10.2196/28577 UR - http://www.ncbi.nlm.nih.gov/pubmed/34842552 ID - info:doi/10.2196/28577 ER - TY - JOUR AU - Loveys, Kate AU - Sagar, Mark AU - Zhang, Xueyuan AU - Fricchione, Gregory AU - Broadbent, Elizabeth PY - 2021/11/25 TI - Effects of Emotional Expressiveness of a Female Digital Human on Loneliness, Stress, Perceived Support, and Closeness Across Genders: Randomized Controlled Trial JO - J Med Internet Res SP - e30624 VL - 23 IS - 11 KW - computer agent KW - digital human KW - emotional expressiveness KW - loneliness KW - closeness KW - social support KW - stress KW - human-computer interaction KW - voice KW - face KW - physiology N2 - Background: Loneliness is a growing public health problem that has been exacerbated in vulnerable groups during the COVID-19 pandemic. Social support interventions have been shown to reduce loneliness, including when delivered through technology. Digital humans are a new type of computer agent that show promise as supportive peers in health care. For digital humans to be effective and engaging support persons, it is important that they develop closeness with people. Closeness can be increased by emotional expressiveness, particularly in female relationships. However, it is unknown whether emotional expressiveness improves relationships with digital humans and affects physiological responses. Objective: The aim of this study is to investigate whether emotional expression by a digital human can affect psychological and physiological outcomes and whether the effects are moderated by the user?s gender. Methods: A community sample of 198 adults (101 women, 95 men, and 2 gender-diverse individuals) was block-randomized by gender to complete a 15-minute self-disclosure conversation with a female digital human in 1 of 6 conditions. In these conditions, the digital human varied in modality richness and emotional expression on the face and in the voice (emotional, neutral, or no face; emotional or neutral voice). Perceived loneliness, closeness, social support, caring perceptions, and stress were measured after each interaction. Heart rate, skin temperature, and electrodermal activity were assessed during each interaction. 3-way factorial analyses of variance with post hoc tests were conducted. Results: Emotional expression in the voice was associated with greater perceptions of caring and physiological arousal during the interaction, and unexpectedly, with lower feelings of support. User gender moderated the effect of emotional expressiveness on several outcomes. For women, an emotional voice was associated with increased closeness, social support, and caring perceptions, whereas for men, a neutral voice increased these outcomes. For women, interacting with a neutral face was associated with lower loneliness and subjective stress compared with no face. Interacting with no face (ie, a voice-only black screen) resulted in lower loneliness and subjective stress for men, compared with a neutral or emotional face. No significant results were found for heart rate or skin temperature. However, average electrodermal activity was significantly higher for men while interacting with an emotional voice. Conclusions: Emotional expressiveness in a female digital human has different effects on loneliness, social, and physiological outcomes for men and women. The results inform the design of digital human support persons and have theoretical implications. Further research is needed to evaluate how more pronounced emotional facial expressions in a digital human might affect the results. Trial Registration: Australia New Zealand Clinical Trials Registry (ANZCTR) ACTRN12621000865819; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=381816&isReview UR - https://www.jmir.org/2021/11/e30624 UR - http://dx.doi.org/10.2196/30624 UR - http://www.ncbi.nlm.nih.gov/pubmed/34842540 ID - info:doi/10.2196/30624 ER - TY - JOUR AU - Chavez-Yenter, Daniel AU - Kimball, E. Kadyn AU - Kohlmann, Wendy AU - Lorenz Chambers, Rachelle AU - Bradshaw, L. Richard AU - Espinel, F. Whitney AU - Flynn, Michael AU - Gammon, Amanda AU - Goldberg, Eric AU - Hagerty, J. Kelsi AU - Hess, Rachel AU - Kessler, Cecilia AU - Monahan, Rachel AU - Temares, Danielle AU - Tobik, Katie AU - Mann, M. Devin AU - Kawamoto, Kensaku AU - Del Fiol, Guilherme AU - Buys, S. Saundra AU - Ginsburg, Ophira AU - Kaphingst, A. Kimberly PY - 2021/11/18 TI - Patient Interactions With an Automated Conversational Agent Delivering Pretest Genetics Education: Descriptive Study JO - J Med Internet Res SP - e29447 VL - 23 IS - 11 KW - cancer KW - genetic testing KW - virtual conversational agent KW - user interaction KW - smartphone KW - mobile phone N2 - Background: Cancer genetic testing to assess an individual?s cancer risk and to enable genomics-informed cancer treatment has grown exponentially in the past decade. Because of this continued growth and a shortage of health care workers, there is a need for automated strategies that provide high-quality genetics services to patients to reduce the clinical demand for genetics providers. Conversational agents have shown promise in managing mental health, pain, and other chronic conditions and are increasingly being used in cancer genetic services. However, research on how patients interact with these agents to satisfy their information needs is limited. Objective: Our primary aim is to assess user interactions with a conversational agent for pretest genetics education. Methods: We conducted a feasibility study of user interactions with a conversational agent who delivers pretest genetics education to primary care patients without cancer who are eligible for cancer genetic evaluation. The conversational agent provided scripted content similar to that delivered in a pretest genetic counseling visit for cancer genetic testing. Outside of a core set of information delivered to all patients, users were able to navigate within the chat to request additional content in their areas of interest. An artificial intelligence?based preprogrammed library was also established to allow users to ask open-ended questions to the conversational agent. Transcripts of the interactions were recorded. Here, we describe the information selected, time spent to complete the chat, and use of the open-ended question feature. Descriptive statistics were used for quantitative measures, and thematic analyses were used for qualitative responses. Results: We invited 103 patients to participate, of which 88.3% (91/103) were offered access to the conversational agent, 39% (36/91) started the chat, and 32% (30/91) completed the chat. Most users who completed the chat indicated that they wanted to continue with genetic testing (21/30, 70%), few were unsure (9/30, 30%), and no patient declined to move forward with testing. Those who decided to test spent an average of 10 (SD 2.57) minutes on the chat, selected an average of 1.87 (SD 1.2) additional pieces of information, and generally did not ask open-ended questions. Those who were unsure spent 4 more minutes on average (mean 14.1, SD 7.41; P=.03) on the chat, selected an average of 3.67 (SD 2.9) additional pieces of information, and asked at least one open-ended question. Conclusions: The pretest chat provided enough information for most patients to decide on cancer genetic testing, as indicated by the small number of open-ended questions. A subset of participants were still unsure about receiving genetic testing and may require additional education or interpersonal support before making a testing decision. Conversational agents have the potential to become a scalable alternative for pretest genetics education, reducing the clinical demand on genetics providers. UR - https://www.jmir.org/2021/11/e29447 UR - http://dx.doi.org/10.2196/29447 UR - http://www.ncbi.nlm.nih.gov/pubmed/34792472 ID - info:doi/10.2196/29447 ER - TY - JOUR AU - Chukwu, Emeka AU - Gilroy, Sonia AU - Addaquay, Kojo AU - Jones, Nafisa Nki AU - Karimu, Gbadia Victor AU - Garg, Lalit AU - Dickson, Eva Kim PY - 2021/11/12 TI - Formative Study of Mobile Phone Use for Family Planning Among Young People in Sierra Leone: Global Systematic Survey JO - JMIR Form Res SP - e23874 VL - 5 IS - 11 KW - young people KW - short message service KW - SMS KW - chatbot KW - text message KW - interactive voice response KW - IVR KW - WhatsApp KW - Facebook KW - family planning KW - contraceptives KW - Sierra Leone N2 - Background: Teenage pregnancy remains high with low contraceptive prevalence among adolescents (aged 15-19 years) in Sierra Leone. Stakeholders leverage multiple strategies to address the challenge. Mobile technology is pervasive and presents an opportunity to reach young people with critical sexual reproductive health and family planning messages. Objective: The objectives of this research study are to understand how mobile health (mHealth) is used for family planning, understand phone use habits among young people in Sierra Leone, and recommend strategies for mobile-enabled dissemination of family planning information at scale. Methods: This formative research study was conducted using a systematic literature review and focus group discussions (FGDs). The literature survey assessed similar but existing interventions through a systematic search of 6 scholarly databases. Cross-sections of young people of both sexes and their support groups were engaged in 9 FGDs in an urban and a rural district in Sierra Leone. The FGD data were qualitatively analyzed using MAXQDA software (VERBI Software GmbH) to determine appropriate technology channels, content, and format for different user segments. Results: Our systematic search results were categorized using Grading of Recommended Assessment and Evaluation (GRADE) into communication channels, audiovisual messaging format, purpose of the intervention, and message direction. The majority of reviewed articles report on SMS-based interventions. At the same time, most intervention purposes are for awareness and as helpful resources. Our survey did not find documented use of custom mHealth apps for family planning information dissemination. From the FGDs, more young people in Sierra Leone own basic mobile phones than those that have feature capablilities or are smartphone. Young people with smartphones use them mostly for WhatsApp and Facebook. Young people widely subscribe to the social media?only internet bundle, with the cost ranging from 1000 leones (US $0.11) to 1500 leones (US $0.16) daily. Pupils in both districts top-up their voice call and SMS credit every day between 1000 leones (US $0.11) and 5000 leones (US $0.52). Conclusions: mHealth has facilitated family planning information dissemination for demand creation around the world. Despite the widespread use of social and new media, SMS is the scalable channel to reach literate and semiliterate young people. We have cataloged mHealth for contraceptive research to show SMS followed by call center as widely used channels. Jingles are popular for audiovisual message formats, mostly delivered as either push or pull only message directions (not both). Interactive voice response and automated calls are best suited to reach nonliterate young people at scale. UR - https://formative.jmir.org/2021/11/e23874 UR - http://dx.doi.org/10.2196/23874 UR - http://www.ncbi.nlm.nih.gov/pubmed/34766908 ID - info:doi/10.2196/23874 ER - TY - JOUR AU - Dhinagaran, Ardhithy Dhakshenya AU - Sathish, Thirunavukkarasu AU - Kowatsch, Tobias AU - Griva, Konstadina AU - Best, Donovan James AU - Tudor Car, Lorainne PY - 2021/11/11 TI - Public Perceptions of Diabetes, Healthy Living, and Conversational Agents in Singapore: Needs Assessment JO - JMIR Form Res SP - e30435 VL - 5 IS - 11 KW - conversational agents KW - chatbots KW - diabetes KW - prediabetes KW - healthy lifestyle change KW - mobile phone N2 - Background: The incidence of chronic diseases such as type 2 diabetes is increasing in countries worldwide, including Singapore. Health professional?delivered healthy lifestyle interventions have been shown to prevent type 2 diabetes. However, ongoing personalized guidance from health professionals is not feasible or affordable at the population level. Novel digital interventions delivered using mobile technology, such as conversational agents, are a potential alternative for the delivery of healthy lifestyle change behavioral interventions to the public. Objective: We explored perceptions and experiences of Singaporeans on healthy living, diabetes, and mobile health (mHealth) interventions (apps and conversational agents). This study was conducted to help inform the design and development of a conversational agent focusing on healthy lifestyle changes. Methods: This qualitative study was conducted in August and September 2019. A total of 20 participants were recruited from relevant healthy living Facebook pages and groups. Semistructured interviews were conducted in person or over the telephone using an interview guide. Interviews were transcribed and analyzed in parallel by 2 researchers using Burnard?s method, a structured approach for thematic content analysis. Results: The collected data were organized into 4 main themes: use of conversational agents, ubiquity of smartphone apps, understanding of diabetes, and barriers and facilitators to a healthy living in Singapore. Most participants used health-related mobile apps as well as conversational agents unrelated to health care. They provided diverse suggestions for future conversational agent-delivered interventions. Participants also highlighted several knowledge gaps in relation to diabetes and healthy living. Regarding barriers to healthy living, participants mentioned frequent dining out, high stress levels, lack of work-life balance, and lack of free time to engage in physical activity. In contrast, discipline, preplanning, and sticking to a routine were important for enabling a healthy lifestyle. Conclusions: Participants in this study commonly used mHealth interventions and provided important insights into their knowledge gaps and needs in relation to changes in healthy lifestyle behaviors. Future digital interventions such as conversational agents focusing on healthy lifestyle and diabetes prevention should aim to address the barriers highlighted in our study and motivate individuals to adopt healthy lifestyle behavior. UR - https://formative.jmir.org/2021/11/e30435 UR - http://dx.doi.org/10.2196/30435 UR - http://www.ncbi.nlm.nih.gov/pubmed/34762053 ID - info:doi/10.2196/30435 ER - TY - JOUR AU - Kim, Jihae Agnes AU - Yang, Jisun AU - Jang, Yihyun AU - Baek, Sang Joon PY - 2021/11/9 TI - Acceptance of an Informational Antituberculosis Chatbot Among Korean Adults: Mixed Methods Research JO - JMIR Mhealth Uhealth SP - e26424 VL - 9 IS - 11 KW - tuberculosis KW - chatbot KW - technology acceptance model KW - mobile phone N2 - Background: Tuberculosis (TB) is a highly infectious disease. Negative perceptions and insufficient knowledge have made its eradication difficult. Recently, mobile health care interventions, such as an anti-TB chatbot developed by the research team, have emerged in support of TB eradication programs. However, before the anti-TB chatbot is deployed, it is important to understand the factors that predict its acceptance by the population. Objective: This study aims to explore the acceptance of an anti-TB chatbot that provides information about the disease and its treatment to people vulnerable to TB in South Korea. Thus, we are investigating the factors that predict technology acceptance through qualitative research based on the interviews of patients with TB and homeless facility personnel. We are then verifying the extended Technology Acceptance Model (TAM) and predicting the factors associated with the acceptance of the chatbot. Methods: In study 1, we conducted interviews with potential chatbot users to extract the factors that predict user acceptance and constructed a conceptual framework based on the TAM. In total, 16 interviews with patients with TB and one focus group interview with 10 experts on TB were conducted. In study 2, we conducted surveys of potential chatbot users to validate the extended TAM. Survey participants were recruited among late-stage patients in TB facilities and members of web-based communities sharing TB information. A total of 123 responses were collected. Results: The results indicate that perceived ease of use and social influence were significantly predictive of perceived usefulness (P=.04 and P<.001, respectively). Perceived usefulness was predictive of the attitude toward the chatbot (P<.001), whereas perceived ease of use (P=.88) was not. Behavioral intention was positively predicted by attitude toward the chatbot and facilitating conditions (P<.001 and P=.03, respectively). The research model explained 55.4% of the variance in the use of anti-TB chatbots. The moderating effect of TB history was found in the relationship between attitude toward the chatbot and behavioral intention (P=.01) and between facilitating conditions and behavioral intention (P=.02). Conclusions: This study can be used to inform future design of anti-TB chatbots and highlight the importance of services and the environment that empower people to use the technology. UR - https://mhealth.jmir.org/2021/11/e26424 UR - http://dx.doi.org/10.2196/26424 UR - http://www.ncbi.nlm.nih.gov/pubmed/34751667 ID - info:doi/10.2196/26424 ER - TY - JOUR AU - Bickmore, W. Timothy AU - Ólafsson, Stefán AU - O'Leary, K. Teresa PY - 2021/11/9 TI - Mitigating Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: Exploratory Mixed Methods Experiment JO - J Med Internet Res SP - e30704 VL - 23 IS - 11 KW - conversational assistant KW - conversational interface KW - dialogue system KW - medical error KW - patient safety KW - risk mitigation KW - warnings KW - disclaimers KW - grounding KW - explainability KW - mobile phone N2 - Background: Prior studies have demonstrated the safety risks when patients and consumers use conversational assistants such as Apple?s Siri and Amazon?s Alexa for obtaining medical information. Objective: The aim of this study is to evaluate two approaches to reducing the likelihood that patients or consumers will act on the potentially harmful medical information they receive from conversational assistants. Methods: Participants were given medical problems to pose to conversational assistants that had been previously demonstrated to result in potentially harmful recommendations. Each conversational assistant?s response was randomly varied to include either a correct or incorrect paraphrase of the query or a disclaimer message?or not?telling the participants that they should not act on the advice without first talking to a physician. The participants were then asked what actions they would take based on their interaction, along with the likelihood of taking the action. The reported actions were recorded and analyzed, and the participants were interviewed at the end of each interaction. Results: A total of 32 participants completed the study, each interacting with 4 conversational assistants. The participants were on average aged 42.44 (SD 14.08) years, 53% (17/32) were women, and 66% (21/32) were college educated. Those participants who heard a correct paraphrase of their query were significantly more likely to state that they would follow the medical advice provided by the conversational assistant (?21=3.1; P=.04). Those participants who heard a disclaimer message were significantly more likely to say that they would contact a physician or health professional before acting on the medical advice received (?21=43.5; P=.001). Conclusions: Designers of conversational systems should consider incorporating both disclaimers and feedback on query understanding in response to user queries for medical advice. Unconstrained natural language input should not be used in systems designed specifically to provide medical advice. UR - https://www.jmir.org/2021/11/e30704 UR - http://dx.doi.org/10.2196/30704 UR - http://www.ncbi.nlm.nih.gov/pubmed/34751661 ID - info:doi/10.2196/30704 ER - TY - JOUR AU - Loveys, Kate AU - Sagar, Mark AU - Pickering, Isabella AU - Broadbent, Elizabeth PY - 2021/11/8 TI - A Digital Human for Delivering a Remote Loneliness and Stress Intervention to At-Risk Younger and Older Adults During the COVID-19 Pandemic: Randomized Pilot Trial JO - JMIR Ment Health SP - e31586 VL - 8 IS - 11 KW - COVID-19 KW - loneliness KW - stress KW - well-being KW - eHealth KW - digital human KW - conversational agent KW - older adults KW - chronic illness N2 - Background: Loneliness is a growing public health issue that has been exacerbated in vulnerable groups during the COVID-19 pandemic. Computer agents are capable of delivering psychological therapies through the internet; however, there is limited research on their acceptability to date. Objective: The objectives of this study were to evaluate (1) the feasibility and acceptability of a remote loneliness and stress intervention with digital human delivery to at-risk adults and (2) the feasibility of the study methods in preparation for a randomized controlled trial. Methods: A parallel randomized pilot trial with a mixed design was conducted. Participants were adults aged 18 to 69 years with an underlying medical condition or aged 70 years or older with a Mini-Mental State Examination score of >24 (ie, at greater risk of developing severe COVID-19). Participants took part from their place of residence (independent living retirement village, 20; community dwelling, 7; nursing home, 3). Participants were randomly allocated to the intervention or waitlist control group that received the intervention 1 week later. The intervention involved completing cognitive behavioral and positive psychology exercises with a digital human facilitator on a website for at least 15 minutes per day over 1 week. The exercises targeted loneliness, stress, and psychological well-being. Feasibility was evaluated using dropout rates and behavioral observation data. Acceptability was evaluated from behavioral engagement data, the Friendship Questionnaire (adapted), self-report items, and qualitative questions. Psychological measures were administered to evaluate the feasibility of the trial methods and included the UCLA Loneliness Scale, the 4-item Perceived Stress Scale, a 1-item COVID-19 distress measure, the Flourishing Scale, and the Scale of Positive and Negative Experiences. Results: The study recruited 30 participants (15 per group). Participants were 22 older adults and 8 younger adults with a health condition. Six participants dropped out of the study. Thus, the data of 24 participants were analyzed (intervention group, 12; waitlist group, 12). The digital human intervention and trial methods were generally found to be feasible and acceptable in younger and older adults living independently, based on intervention completion, and behavioral, qualitative, and some self-report data. The intervention and trial methods were less feasible to nursing home residents who required caregiver assistance. Acceptability could be improved with additional content, tailoring to the population, and changes to the digital human?s design. Conclusions: Digital humans are a promising and novel technological solution for providing at-risk adults with access to remote psychological support during the COVID-19 pandemic. Research should further examine design techniques to improve their acceptability in this application and investigate intervention effectiveness in a randomized controlled trial. Trial Registration: Australia New Zealand Clinical Trials Registry ACTRN12620000786998; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=380113 UR - https://mental.jmir.org/2021/11/e31586 UR - http://dx.doi.org/10.2196/31586 UR - http://www.ncbi.nlm.nih.gov/pubmed/34596572 ID - info:doi/10.2196/31586 ER - TY - JOUR AU - ter Stal, Silke AU - Sloots, Joanne AU - Ramlal, Aniel AU - op den Akker, Harm AU - Lenferink, Anke AU - Tabak, Monique PY - 2021/11/4 TI - An Embodied Conversational Agent in an eHealth Self-management Intervention for Chronic Obstructive Pulmonary Disease and Chronic Heart Failure: Exploratory Study in a Real-life Setting JO - JMIR Hum Factors SP - e24110 VL - 8 IS - 4 KW - embodied conversational agent KW - eHealth KW - self-management KW - design KW - daily life evaluation N2 - Background: Embodied conversational agents (ECAs) have the potential to stimulate actual use of eHealth apps. An ECA?s design influences the user?s perception during short interactions, but daily life evaluations of ECAs in health care are scarce. Objective: This is an exploratory, long-term study on the design of ECAs for eHealth. The study investigates how patients perceive the design of the ECA over time with regard to the ECA?s characteristics (friendliness, trustworthiness, involvement, expertise, and authority), small talk interaction, and likeliness of following the agent?s advice. Methods: We developed an ECA within an eHealth self-management intervention for patients with both chronic obstructive pulmonary disease (COPD) and chronic heart failure (CHF), which we offered for 4 months. Patients rated 5 agent characteristics and likeliness of following the agent?s advice before use and after 3 and 9 weeks of use. The amount of patients? small talk interaction was assessed by log data. Lastly, individual semistructured interviews were used to triangulate results. Results: Eleven patients (7 male and 4 female) with COPD and CHF participated (median age 70 years). Patients? perceptions of the agent characteristics did not change over time (P>.05 for all characteristics) and only 1 participant finished all small talk dialogues. After 3 weeks of use, the patients were less likely to follow the agent?s advice (P=.01). The agent?s messages were perceived as nonpersonalized and the feedback as inappropriate, affecting the agent?s perceived reliability. Conclusions: This exploratory study provides first insights into ECA design for eHealth. The first impression of an ECA?s design seems to remain during long-term use. To investigate future added value of ECAs in eHealth, perceived reliability should be improved by managing users? expectations of the ECA?s capabilities and creating ECA designs fitting individual needs. Trial Registration: Netherlands Trial Register NL6480; https://www.trialregister.nl/trial/6480 UR - https://humanfactors.jmir.org/2021/4/e24110 UR - http://dx.doi.org/10.2196/24110 UR - http://www.ncbi.nlm.nih.gov/pubmed/34734824 ID - info:doi/10.2196/24110 ER - TY - JOUR AU - Woodcock, Claire AU - Mittelstadt, Brent AU - Busbridge, Dan AU - Blank, Grant PY - 2021/11/3 TI - The Impact of Explanations on Layperson Trust in Artificial Intelligence?Driven Symptom Checker Apps: Experimental Study JO - J Med Internet Res SP - e29386 VL - 23 IS - 11 KW - symptom checker KW - chatbot KW - artificial intelligence KW - explanations KW - trust KW - knowledge KW - clinical communication KW - mHealth KW - digital health KW - eHealth KW - conversational agent KW - virtual health care KW - symptoms KW - diagnostics KW - mobile phone N2 - Background: Artificial intelligence (AI)?driven symptom checkers are available to millions of users globally and are advocated as a tool to deliver health care more efficiently. To achieve the promoted benefits of a symptom checker, laypeople must trust and subsequently follow its instructions. In AI, explanations are seen as a tool to communicate the rationale behind black-box decisions to encourage trust and adoption. However, the effectiveness of the types of explanations used in AI-driven symptom checkers has not yet been studied. Explanations can follow many forms, including why-explanations and how-explanations. Social theories suggest that why-explanations are better at communicating knowledge and cultivating trust among laypeople. Objective: The aim of this study is to ascertain whether explanations provided by a symptom checker affect explanatory trust among laypeople and whether this trust is impacted by their existing knowledge of disease. Methods: A cross-sectional survey of 750 healthy participants was conducted. The participants were shown a video of a chatbot simulation that resulted in the diagnosis of either a migraine or temporal arteritis, chosen for their differing levels of epidemiological prevalence. These diagnoses were accompanied by one of four types of explanations. Each explanation type was selected either because of its current use in symptom checkers or because it was informed by theories of contrastive explanation. Exploratory factor analysis of participants? responses followed by comparison-of-means tests were used to evaluate group differences in trust. Results: Depending on the treatment group, two or three variables were generated, reflecting the prior knowledge and subsequent mental model that the participants held. When varying explanation type by disease, migraine was found to be nonsignificant (P=.65) and temporal arteritis, marginally significant (P=.09). Varying disease by explanation type resulted in statistical significance for input influence (P=.001), social proof (P=.049), and no explanation (P=.006), with counterfactual explanation (P=.053). The results suggest that trust in explanations is significantly affected by the disease being explained. When laypeople have existing knowledge of a disease, explanations have little impact on trust. Where the need for information is greater, different explanation types engender significantly different levels of trust. These results indicate that to be successful, symptom checkers need to tailor explanations to each user?s specific question and discount the diseases that they may also be aware of. Conclusions: System builders developing explanations for symptom-checking apps should consider the recipient?s knowledge of a disease and tailor explanations to each user?s specific need. Effort should be placed on generating explanations that are personalized to each user of a symptom checker to fully discount the diseases that they may be aware of and to close their information gap. UR - https://www.jmir.org/2021/11/e29386 UR - http://dx.doi.org/10.2196/29386 UR - http://www.ncbi.nlm.nih.gov/pubmed/34730544 ID - info:doi/10.2196/29386 ER - TY - JOUR AU - Wong, Jill AU - Foussat, C. Agathe AU - Ting, Steven AU - Acerbi, Enzo AU - van Elburg, M. Ruurd AU - Mei Chien, Chua PY - 2021/10/26 TI - A Chatbot to Engage Parents of Preterm and Term Infants on Parental Stress, Parental Sleep, and Infant Feeding: Usability and Feasibility Study JO - JMIR Pediatr Parent SP - e30169 VL - 4 IS - 4 KW - chatbot KW - parental stress KW - parental sleep KW - infant feeding KW - preterm infants KW - term infants KW - sleep KW - stress KW - eHealth KW - support KW - anxiety KW - usability N2 - Background: Parents commonly experience anxiety, worry, and psychological distress in caring for newborn infants, particularly those born preterm. Web-based therapist services may offer greater accessibility and timely psychological support for parents but are nevertheless labor intensive due to their interactive nature. Chatbots that simulate humanlike conversations show promise for such interactive applications. Objective: The aim of this study is to explore the usability and feasibility of chatbot technology for gathering real-life conversation data on stress, sleep, and infant feeding from parents with newborn infants and to investigate differences between the experiences of parents with preterm and term infants. Methods: Parents aged ?21 years with infants aged ?6 months were enrolled from November 2018 to March 2019. Three chatbot scripts (stress, sleep, feeding) were developed to capture conversations with parents via their mobile devices. Parents completed a chatbot usability questionnaire upon study completion. Responses to closed-ended questions and manually coded open-ended responses were summarized descriptively. Open-ended responses were analyzed using the latent Dirichlet allocation method to uncover semantic topics. Results: Of 45 enrolled participants (20 preterm, 25 term), 26 completed the study. Parents rated the chatbot as ?easy? to use (mean 4.08, SD 0.74; 1=very difficult, 5=very easy) and were ?satisfied? (mean 3.81, SD 0.90; 1=very dissatisfied, 5 very satisfied). Of 45 enrolled parents, those with preterm infants reported emotional stress more frequently than did parents of term infants (33 vs 24 occasions). Parents generally reported satisfactory sleep quality. The preterm group reported feeding problems more frequently than did the term group (8 vs 2 occasions). In stress domain conversations, topics linked to ?discomfort? and ?tiredness? were more prevalent in preterm group conversations, whereas the topic of ?positive feelings? occurred more frequently in the term group conversations. Interestingly, feeding-related topics dominated the content of sleep domain conversations, suggesting that frequent or irregular feeding may affect parents? ability to get adequate sleep or rest. Conclusions: The chatbot was successfully used to collect real-time conversation data on stress, sleep, and infant feeding from a group of 45 parents. In their chatbot conversations, term group parents frequently expressed positive emotions, whereas preterm group parents frequently expressed physical discomfort and tiredness, as well as emotional stress. Overall, parents who completed the study gave positive feedback on their user experience with the chatbot as a tool to express their thoughts and concerns. Trial Registration: ClinicalTrials.gov NCT03630679; https://clinicaltrials.gov/ct2/show/NCT03630679 UR - https://pediatrics.jmir.org/2021/4/e30169 UR - http://dx.doi.org/10.2196/30169 UR - http://www.ncbi.nlm.nih.gov/pubmed/34544679 ID - info:doi/10.2196/30169 ER - TY - JOUR AU - Nilsson, Evalill AU - Sverker, Annette AU - Bendtsen, Preben AU - Eldh, Catrine Ann PY - 2021/10/18 TI - A Human, Organization, and Technology Perspective on Patients? Experiences of a Chat-Based and Automated Medical History?Taking Service in Primary Health Care: Interview Study Among Primary Care Patients JO - J Med Internet Res SP - e29868 VL - 23 IS - 10 KW - digital encounter KW - digital healthcare KW - e-consultation KW - e-health KW - interview KW - patient perspective KW - primary healthcare KW - qualitative study KW - telemedicine KW - telehealth N2 - Background: The use of e-visits in health care is progressing rapidly worldwide. To date, studies on the advantages and disadvantages of e-consultations in the form of chat services for all inquiries in primary care have focused on the perspective of health care professionals (HCPs) rather than those of end users (patients). Objective: This study aims to explore patients? experiences using a chat-based and automated medical history?taking service in regular, tax-based, not-for-profit primary care in Sweden. Methods: Overall, 25 individual interviews were conducted with patients in the catchment areas of 5 primary care centers (PCCs) in Sweden that tested a chat-based and automated medical history?taking service for all types of patient inquiries. The semistructured interviews were transcribed verbatim before content analysis using inductive and deductive strategies, the latter including an unconstrained matrix of human, organization, and technology perspectives. Results: The service provided an easily managed way for patients to make written contact with HCPs, which was considered beneficial for some patients and issues but less suitable for others (acute or more complex cases). The automated medical history?taking service was perceived as having potential but still derived from what HCPs need to know and how they address and communicate health and health care issues. Technical skills were not considered as necessary for a mobile phone chat as for handling a computer; however, patients still expressed concern for people with less digital literacy. The opportunity to take one?s time and reflect on one?s situation before answering questions from the HCPs was found to reduce stress and prevent errors, and patients speculated that it might be the same for the HCPs on the other end of the system. Patients appreciated the ability to have a conversation from almost anywhere, even from places not suitable for telephone calls. The asynchronicity of the chat service allowed the patients to take more control of the conversation and initiate a chat at any time at their own convenience; however, it could also lead to lengthy conversations where a single issue in the worst cases could take days to close. The opportunity to upload photographs made some visits to the PCC redundant, which would otherwise have been necessary if the ordinary telephone service had been used, saving patients both time and money. Conclusions: Patients generally had a positive attitude toward e-visits in primary care and were generally pleased with the prospects of the digital tool tested, somewhat more with the actual chat than with the automated history-taking system preceding the chat. Although patients expect their PCC to offer a range of different means of communication, the human, organization, and technology analysis revealed a need for more extensive (end) user experience design in the further development of the chat service. UR - https://www.jmir.org/2021/10/e29868 UR - http://dx.doi.org/10.2196/29868 UR - http://www.ncbi.nlm.nih.gov/pubmed/34661544 ID - info:doi/10.2196/29868 ER - TY - JOUR AU - Sager, A. Monique AU - Kashyap, M. Aditya AU - Tamminga, Mila AU - Ravoori, Sadhana AU - Callison-Burch, Christopher AU - Lipoff, B. Jules PY - 2021/9/30 TI - Identifying and Responding to Health Misinformation on Reddit Dermatology Forums With Artificially Intelligent Bots Using Natural Language Processing: Design and Evaluation Study JO - JMIR Dermatol SP - e20975 VL - 4 IS - 2 KW - bots KW - natural language processing KW - artificial intelligence KW - Reddit, medical misinformation KW - health misinformation KW - detecting misinformation KW - dermatology KW - misinformation N2 - Background: Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to the provision of inappropriate care. Initial testing has revealed that artificially intelligent bots can detect misinformation regarding tanning and essential oils on Reddit dermatology forums and may be able to produce responses to posts containing misinformation. Objective: To analyze the ability of bots to find and respond to tanning and essential oil?related health misinformation on Reddit?s dermatology forums in a controlled test environment. Methods: Using natural language processing techniques, we trained bots to target misinformation, using relevant keywords and to post prefabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. Results: Our models yielded data test accuracies ranging 95%-100%, with a Bidirectional Encoder Representations from Transformers (BERT) fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective prefabricated responses to misinformation in a test environment. Conclusions: Using a limited data set, bots accurately detected examples of health misinformation within Reddit dermatology forums. Given that these bots can then post prefabricated responses, this technique may allow for interception of misinformation. Providing correct information does not mean that users will be receptive or find such interventions persuasive. Further studies should investigate this strategy?s effectiveness to inform future deployment of bots as a technique in combating health misinformation. UR - https://derma.jmir.org/2021/2/e20975 UR - http://dx.doi.org/10.2196/20975 UR - http://www.ncbi.nlm.nih.gov/pubmed/37632809 ID - info:doi/10.2196/20975 ER - TY - JOUR AU - Boustani, Maya AU - Lunn, Stephanie AU - Visser, Ubbo AU - Lisetti, Christine PY - 2021/9/29 TI - Development, Feasibility, Acceptability, and Utility of an Expressive Speech-Enabled Digital Health Agent to Deliver Online, Brief Motivational Interviewing for Alcohol Misuse: Descriptive Study JO - J Med Internet Res SP - e25837 VL - 23 IS - 9 KW - digital health agent KW - virtual health assistant KW - online intervention KW - alcohol abuse KW - brief intervention KW - motivational interviewing KW - intelligent virtual agent KW - embodied conversational agent N2 - Background: Digital health agents ? embodied conversational agents designed specifically for health interventions ? provide a promising alternative or supplement to behavioral health services by reducing barriers to access to care. Objective: Our goals were to (1) develop an expressive, speech-enabled digital health agent operating in a 3-dimensional virtual environment to deliver a brief behavioral health intervention over the internet to reduce alcohol use and to (2) understand its acceptability, feasibility, and utility with its end users. Methods: We developed an expressive, speech-enabled digital health agent with facial expressions and body gestures operating in a 3-dimensional virtual office and able to deliver a brief behavioral health intervention over the internet to reduce alcohol use. We then asked 51 alcohol users to report on the digital health agent acceptability, feasibility, and utility. Results: The developed digital health agent uses speech recognition and a model of empathetic verbal and nonverbal behaviors to engage the user, and its performance enabled it to successfully deliver a brief behavioral health intervention over the internet to reduce alcohol use. Descriptive statistics indicated that participants had overwhelmingly positive experiences with the digital health agent, including engagement with the technology, acceptance, perceived utility, and intent to use the technology. Illustrative qualitative quotes provided further insight about the potential reach and impact of digital health agents in behavioral health care. Conclusions: Web-delivered interventions delivered by expressive, speech-enabled digital health agents may provide an exciting complement or alternative to traditional one-on-one treatment. They may be especially helpful for hard-to-reach communities with behavioral workforce shortages. UR - https://www.jmir.org/2021/9/e25837 UR - http://dx.doi.org/10.2196/25837 UR - http://www.ncbi.nlm.nih.gov/pubmed/34586074 ID - info:doi/10.2196/25837 ER - TY - JOUR AU - Anan, Tomomi AU - Kajiki, Shigeyuki AU - Oka, Hiroyuki AU - Fujii, Tomoko AU - Kawamata, Kayo AU - Mori, Koji AU - Matsudaira, Ko PY - 2021/9/24 TI - Effects of an Artificial Intelligence?Assisted Health Program on Workers With Neck/Shoulder Pain/Stiffness and Low Back Pain: Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e27535 VL - 9 IS - 9 KW - neck pain KW - shoulder pain KW - shoulder stiffness KW - low back pain KW - musculoskeletal symptoms KW - digital intervention KW - mobile app KW - mHealth KW - eHealth KW - digital health KW - mobile phone N2 - Background: Musculoskeletal symptoms such as neck and shoulder pain/stiffness and low back pain are common health problems in the working population. They are the leading causes of presenteeism (employees being physically present at work but unable to be fully engaged). Recently, digital interventions have begun to be used to manage health but their effectiveness has not yet been fully verified, and adherence to such programs is always a problem. Objective: This study aimed to evaluate the improvements in musculoskeletal symptoms in workers with neck/shoulder stiffness/pain and low back pain after the use of an exercise-based artificial intelligence (AI)?assisted interactive health promotion system that operates through a mobile messaging app (the AI-assisted health program). We expected that this program would support participants? adherence to exercises. Methods: We conducted a two-armed, randomized, controlled, and unblinded trial in workers with either neck/shoulder stiffness/pain or low back pain or both. We recruited participants with these symptoms through email notifications. The intervention group received the AI-assisted health program, in which the chatbot sent messages to users with the exercise instructions at a fixed time every day through the smartphone?s chatting app (LINE) for 12 weeks. The program was fully automated. The control group continued with their usual care routines. We assessed the subjective severity of the neck and shoulder pain/stiffness and low back pain of the participants by using a scoring scale of 1 to 5 for both the intervention group and the control group at baseline and after 12 weeks of intervention by using a web-based form. We used a logistic regression model to calculate the odds ratios (ORs) of the intervention group to achieve to reduce pain scores with those of the control group, and the ORs of the subjective assessment of the improvement of the symptoms compared to the intervention and control groups, which were performed using Stata software (version 16, StataCorp LLC). Results: We analyzed 48 participants in the intervention group and 46 participants in the control group. The adherence rate was 92% (44/48) during the intervention. The participants in the intervention group showed significant improvements in the severity of the neck/shoulder pain/stiffness and low back pain compared to those in the control group (OR 6.36, 95% CI 2.57-15.73; P<.001). Based on the subjective assessment of the improvement of the pain/stiffness at 12 weeks, 36 (75%) out of 48 participants in the intervention group and 3 (7%) out of 46 participants in the control group showed improvements (improved, slightly improved) (OR 43.00, 95% CI 11.25-164.28; P<.001). Conclusions: This study shows that the short exercises provided by the AI-assisted health program improved both neck/shoulder pain/stiffness and low back pain in 12 weeks. Further studies are needed to identify the elements contributing to the successful outcome of the AI-assisted health program. Trial Registration: University hospital Medical Information Network-Clinical Trials Registry (UMIN-CTR) 000033894; https://upload.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000038307. UR - https://mhealth.jmir.org/2021/9/e27535 UR - http://dx.doi.org/10.2196/27535 UR - http://www.ncbi.nlm.nih.gov/pubmed/34559054 ID - info:doi/10.2196/27535 ER - TY - JOUR AU - Siedlikowski, Sophia AU - Noël, Louis-Philippe AU - Moynihan, Anne Stephanie AU - Robin, Marc PY - 2021/9/21 TI - Chloe for COVID-19: Evolution of an Intelligent Conversational Agent to Address Infodemic Management Needs During the COVID-19 Pandemic JO - J Med Internet Res SP - e27283 VL - 23 IS - 9 KW - chatbot KW - COVID-19 KW - conversational agents KW - public health KW - artificial intelligence KW - infodemic KW - infodemiology KW - misinformation KW - digital health KW - virtual care UR - https://www.jmir.org/2021/9/e27283 UR - http://dx.doi.org/10.2196/27283 UR - http://www.ncbi.nlm.nih.gov/pubmed/34375299 ID - info:doi/10.2196/27283 ER - TY - JOUR AU - Guerreiro, Pereira Mara AU - Angelini, Leonardo AU - Rafael Henriques, Helga AU - El Kamali, Mira AU - Baixinho, Cristina AU - Balsa, João AU - Félix, Brito Isa AU - Khaled, Abou Omar AU - Carmo, Beatriz Maria AU - Cláudio, Paula Ana AU - Caon, Maurizio AU - Daher, Karl AU - Alexandre, Bruno AU - Padinha, Mafalda AU - Mugellini, Elena PY - 2021/9/17 TI - Conversational Agents for Health and Well-being Across the Life Course: Protocol for an Evidence Map JO - JMIR Res Protoc SP - e26680 VL - 10 IS - 9 KW - artificial intelligence KW - conversational agent KW - chatbot KW - virtual assistant KW - relational agent KW - virtual humans KW - e-coach KW - intervention KW - health KW - well-being N2 - Background: Conversational agents, which we defined as computer programs that are designed to simulate two-way human conversation by using language and are potentially supplemented with nonlanguage modalities, offer promising avenues for health interventions for different populations across the life course. There is a lack of open-access and user-friendly resources for identifying research trends and gaps and pinpointing expertise across international centers. Objective: Our aim is to provide an overview of all relevant evidence on conversational agents for health and well-being across the life course. Specifically, our objectives are to identify, categorize, and synthesize?through visual formats and a searchable database?primary studies and reviews in this research field. Methods: An evidence map was selected as the type of literature review to be conducted, as it optimally corresponded to our aim. We systematically searched 8 databases (MEDLINE; CINAHL; Web of Science; Scopus; the Cochrane, ACM, IEEE, and Joanna Briggs Institute databases; and Google Scholar). We will perform backward citation searching on all included studies. The first stage of a double-stage screening procedure, which was based on abstracts and titles only, was conducted by using predetermined eligibility criteria for primary studies and reviews. An operational screening procedure was developed for streamlined and consistent screening across the team. Double data extraction will be performed with previously piloted data collection forms. We will appraise systematic reviews by using A Measurement Tool to Assess Systematic Reviews (AMSTAR) 2. Primary studies and reviews will be assessed separately in the analysis. Data will be synthesized through descriptive statistics, bivariate statistics, and subgroup analysis (if appropriate) and through high-level maps such as scatter and bubble charts. The development of the searchable database will be informed by the research questions and data extraction forms. Results: As of April 2021, the literature search in the eight databases was concluded, yielding a total of 16,351 records. The first stage of screening, which was based on abstracts and titles only, resulted in the selection of 1282 records of primary studies and 151 records of reviews. These will be subjected to second-stage screening. A glossary with operational definitions for supporting the study selection and data extraction stages was drafted. The anticipated completion date is October 2021. Conclusions: Our wider definition of a conversational agent and the broad scope of our evidence map will explicate trends and gaps in this field of research. Additionally, our evidence map and searchable database of studies will help researchers to avoid fragmented research efforts and wasteful redundancies. Finally, as part of the Harnessing the Power of Conversational e-Coaches for Health and Well-being Through Swiss-Portuguese Collaboration project, our work will also inform the development of an international taxonomy on conversational agents for health and well-being, thereby contributing to terminology standardization and categorization. International Registered Report Identifier (IRRID): DERR1-10.2196/26680 UR - https://www.researchprotocols.org/2021/9/e26680 UR - http://dx.doi.org/10.2196/26680 UR - http://www.ncbi.nlm.nih.gov/pubmed/34533460 ID - info:doi/10.2196/26680 ER - TY - JOUR AU - Luo, Christina Tiffany AU - Aguilera, Adrian AU - Lyles, Rees Courtney AU - Figueroa, Astrid Caroline PY - 2021/9/14 TI - Promoting Physical Activity Through Conversational Agents: Mixed Methods Systematic Review JO - J Med Internet Res SP - e25486 VL - 23 IS - 9 KW - physical activity KW - health behavior KW - behavior change KW - conversational agent KW - virtual agent KW - chatbot KW - digital health KW - eHealth KW - mHealth KW - mobile health KW - mobile phone N2 - Background: Regular physical activity (PA) is crucial for well-being; however, healthy habits are difficult to create and maintain. Interventions delivered via conversational agents (eg, chatbots or virtual agents) are a novel and potentially accessible way to promote PA. Thus, it is important to understand the evolving landscape of research that uses conversational agents. Objective: This mixed methods systematic review aims to summarize the usability and effectiveness of conversational agents in promoting PA, describe common theories and intervention components used, and identify areas for further development. Methods: We conducted a mixed methods systematic review. We searched seven electronic databases (PsycINFO, PubMed, Embase, CINAHL, ACM Digital Library, Scopus, and Web of Science) for quantitative, qualitative, and mixed methods studies that conveyed primary research on automated conversational agents designed to increase PA. The studies were independently screened, and their methodological quality was assessed using the Mixed Methods Appraisal Tool by 2 reviewers. Data on intervention impact and effectiveness, treatment characteristics, and challenges were extracted and analyzed using parallel-results convergent synthesis and narrative summary. Results: In total, 255 studies were identified, 7.8% (20) of which met our inclusion criteria. The methodological quality of the studies was varied. Overall, conversational agents had moderate usability and feasibility. Those that were evaluated through randomized controlled trials were found to be effective in promoting PA. Common challenges facing interventions were repetitive program content, high attrition, technical issues, and safety and privacy concerns. Conclusions: Conversational agents hold promise for PA interventions. However, there is a lack of rigorous research on long-term intervention effectiveness and patient safety. Future interventions should be based on evidence-informed theories and treatment approaches and should address users? desires for program variety, natural language processing, delivery via mobile devices, and safety and privacy concerns. UR - https://www.jmir.org/2021/9/e25486 UR - http://dx.doi.org/10.2196/25486 UR - http://www.ncbi.nlm.nih.gov/pubmed/34519653 ID - info:doi/10.2196/25486 ER - TY - JOUR AU - Mauriello, Louis Matthew AU - Tantivasadakarn, Nantanick AU - Mora-Mendoza, Antonio Marco AU - Lincoln, Thierry Emmanuel AU - Hon, Grace AU - Nowruzi, Parsa AU - Simon, Dorien AU - Hansen, Luke AU - Goenawan, H. Nathaniel AU - Kim, Joshua AU - Gowda, Nikhil AU - Jurafsky, Dan AU - Paredes, Enrique Pablo PY - 2021/9/14 TI - A Suite of Mobile Conversational Agents for Daily Stress Management (Popbots): Mixed Methods Exploratory Study JO - JMIR Form Res SP - e25294 VL - 5 IS - 9 KW - conversational agents KW - virtual agent KW - chatbot KW - therapy KW - stress relief KW - stress management KW - mental health KW - stress KW - exploratory KW - support KW - mobile phone N2 - Background: Approximately 60%-80% of the primary care visits have a psychological stress component, but only 3% of patients receive stress management advice during these visits. Given recent advances in natural language processing, there is renewed interest in mental health chatbots. Conversational agents that can understand a user?s problems and deliver advice that mitigates the effects of daily stress could be an effective public health tool. However, such systems are complex to build and costly to develop. Objective: To address these challenges, our aim is to develop and evaluate a fully automated mobile suite of shallow chatbots?we call them Popbots?that may serve as a new species of chatbots and further complement human assistance in an ecosystem of stress management support. Methods: After conducting an exploratory Wizard of Oz study (N=14) to evaluate the feasibility of a suite of multiple chatbots, we conducted a web-based study (N=47) to evaluate the implementation of our prototype. Each participant was randomly assigned to a different chatbot designed on the basis of a proven cognitive or behavioral intervention method. To measure the effectiveness of the chatbots, the participants? stress levels were determined using self-reported psychometric evaluations (eg, web-based daily surveys and Patient Health Questionnaire-4). The participants in these studies were recruited through email and enrolled on the web, and some of them participated in follow-up interviews that were conducted in person or on the web (as necessary). Results: Of the 47 participants, 31 (66%) completed the main study. The findings suggest that the users viewed the conversations with our chatbots as helpful or at least neutral and came away with increasingly positive sentiment toward the use of chatbots for proactive stress management. Moreover, those users who used the system more often (ie, they had more than or equal to the median number of conversations) noted a decrease in depression symptoms compared with those who used the system less often based on a Wilcoxon signed-rank test (W=91.50; Z=?2.54; P=.01; r=0.47). The follow-up interviews with a subset of the participants indicated that half of the common daily stressors could be discussed with chatbots, potentially reducing the burden on human coping resources. Conclusions: Our work suggests that suites of shallow chatbots may offer benefits for both users and designers. As a result, this study?s contributions include the design and evaluation of a novel suite of shallow chatbots for daily stress management, a summary of benefits and challenges associated with random delivery of multiple conversational interventions, and design guidelines and directions for future research into similar systems, including authoring chatbot systems and artificial intelligence?enabled recommendation algorithms. UR - https://formative.jmir.org/2021/9/e25294 UR - http://dx.doi.org/10.2196/25294 UR - http://www.ncbi.nlm.nih.gov/pubmed/34519655 ID - info:doi/10.2196/25294 ER - TY - JOUR AU - Klos, Carolina Maria AU - Escoredo, Milagros AU - Joerin, Angela AU - Lemos, Noemí Viviana AU - Rauws, Michiel AU - Bunge, L. Eduardo PY - 2021/8/12 TI - Artificial Intelligence?Based Chatbot for Anxiety and Depression in University Students: Pilot Randomized Controlled Trial JO - JMIR Form Res SP - e20678 VL - 5 IS - 8 KW - artificial intelligence KW - chatbots KW - conversational agents KW - mental health KW - anxiety KW - depression KW - college students N2 - Background: Artificial intelligence?based chatbots are emerging as instruments of psychological intervention; however, no relevant studies have been reported in Latin America. Objective: The objective of the present study was to evaluate the viability, acceptability, and potential impact of using Tess, a chatbot, for examining symptoms of depression and anxiety in university students. Methods: This was a pilot randomized controlled trial. The experimental condition used Tess for 8 weeks, and the control condition was assigned to a psychoeducation book on depression. Comparisons were conducted using Mann-Whitney U and Wilcoxon tests for depressive symptoms, and independent and paired sample t tests to analyze anxiety symptoms. Results: The initial sample consisted of 181 Argentinian college students (158, 87.2% female) aged 18 to 33. Data at week 8 were provided by 39 out of the 99 (39%) participants in the experimental condition and 34 out of the 82 (41%) in the control group. On an average, 472 (SD 249.52) messages were exchanged, with 116 (SD 73.87) of the messages sent from the users in response to Tess. A higher number of messages exchanged with Tess was associated with positive feedback (F2,36=4.37; P=.02). No significant differences between the experimental and control groups were found from the baseline to week 8 for depressive and anxiety symptoms. However, significant intragroup differences demonstrated that the experimental group showed a significant decrease in anxiety symptoms; no such differences were observed for the control group. Further, no significant intragroup differences were found for depressive symptoms. Conclusions: The students spent a considerable amount of time exchanging messages with Tess and positive feedback was associated with a higher number of messages exchanged. The initial results show promising evidence for the usability and acceptability of Tess in the Argentinian population. Research on chatbots is still in its initial stages and further research is needed. UR - https://formative.jmir.org/2021/8/e20678 UR - http://dx.doi.org/10.2196/20678 UR - http://www.ncbi.nlm.nih.gov/pubmed/34092548 ID - info:doi/10.2196/20678 ER - TY - JOUR AU - Heffner, L. Jaimee AU - Watson, L. Noreen AU - Serfozo, Edit AU - Kelly, M. Megan AU - Reilly, D. Erin AU - Kim, Daniella AU - Baker, Kelsey AU - Scout, N. N. F. AU - Karekla, Maria PY - 2021/7/30 TI - An Avatar-Led Digital Smoking Cessation Program for Sexual and Gender Minority Young Adults: Intervention Development and Results of a Single-Arm Pilot Trial JO - JMIR Form Res SP - e30241 VL - 5 IS - 7 KW - LGBT KW - embodied agent KW - tobacco cessation KW - nicotine dependence KW - user-centered design KW - avatar KW - digital health KW - minority KW - young adult KW - teenager KW - smoking KW - cessation KW - intervention KW - development KW - pilot trial N2 - Background: Sexual and gender minority young adults have a high prevalence of smoking and unique barriers to accessing tobacco treatment. Objective: To address these challenges as well as their preferences for sexual and gender minority?targeted interventions and digital programs, we developed and evaluated the acceptability, preliminary efficacy, and impact on theory-based change processes of an acceptance and commitment therapy?based digital program called Empowered, Queer, Quitting, and Living (EQQUAL). Methods: Participants (n=22) of a single-arm trial conducted to evaluate the program were young adults, age 18 to 30 years, who self-identified as sexual and gender minority individuals and smoked at least one cigarette per day. All participants received access to the EQQUAL program. Participants completed web-based surveys at baseline and at a follow-up 2 months after enrollment. We verified self-reported smoking abstinence with biochemical testing; missing data were counted as smoking or using tobacco. Results: For young adults who logged in at least once (n=18), the mean number of log-ins was 5.5 (SD 3.6), mean number of sessions completed was 3.1 (SD 2.6), and 39% (7/18) completed all 6 sessions. Overall, 93% of participants (14/15) were satisfied with the EQQUAL program, 100% (15/15) found it easy to use, and 100% (15/15) said it helped them be clearer about how to quit. Abstinence from smoking or using tobacco was confirmed with biochemical testing for 23% of participants (5/22). Both quantitative and qualitative results suggested a positive overall response to the avatar guide, with areas for future improvement largely centered on the avatar?s appearance and movements. Conclusions: Treatment acceptability of EQQUAL was very promising. The rate of abstinence, which was biochemically confirmed, was 3 times higher than that of the only other digital program to date that has targeted sexual and gender minority young adults and 6 to 13 times higher than those of nontargeted digital smoking interventions among sexual and gender minority young adults. Planned improvements for the next iteration of the program include making the avatar?s movements more natural; offering multiple avatar guides with different on characteristics such as race, ethnicity, and gender identity from which to choose; and providing a support forum for users to connect anonymously with peers. UR - https://formative.jmir.org/2021/7/e30241 UR - http://dx.doi.org/10.2196/30241 UR - http://www.ncbi.nlm.nih.gov/pubmed/34328430 ID - info:doi/10.2196/30241 ER - TY - JOUR AU - de Pennington, Nick AU - Mole, Guy AU - Lim, Ernest AU - Milne-Ives, Madison AU - Normando, Eduardo AU - Xue, Kanmin AU - Meinert, Edward PY - 2021/7/28 TI - Safety and Acceptability of a Natural Language Artificial Intelligence Assistant to Deliver Clinical Follow-up to Cataract Surgery Patients: Proposal JO - JMIR Res Protoc SP - e27227 VL - 10 IS - 7 KW - artificial intelligence KW - natural language processing KW - telemedicine KW - cataract KW - aftercare KW - speech recognition software KW - medical informatics KW - health services KW - health communication KW - delivery of health care KW - patient acceptance of health care KW - mental health KW - cell phone KW - internet KW - conversational agent KW - chatbot KW - expert systems KW - dialogue system KW - relational agent N2 - Background: Due to an aging population, the demand for many services is exceeding the capacity of the clinical workforce. As a result, staff are facing a crisis of burnout from being pressured to deliver high-volume workloads, driving increasing costs for providers. Artificial intelligence (AI), in the form of conversational agents, presents a possible opportunity to enable efficiency in the delivery of care. Objective: This study aims to evaluate the effectiveness, usability, and acceptability of Dora agent: Ufonia?s autonomous voice conversational agent, an AI-enabled autonomous telemedicine call for the detection of postoperative cataract surgery patients who require further assessment. The objectives of this study are to establish Dora?s efficacy in comparison with an expert clinician, determine baseline sensitivity and specificity for the detection of true complications, evaluate patient acceptability, collect evidence for cost-effectiveness, and capture data to support further development and evaluation. Methods: Using an implementation science construct, the interdisciplinary study will be a mixed methods phase 1 pilot establishing interobserver reliability of the system, usability, and acceptability. This will be done using the following scales and frameworks: the system usability scale; assessment of Health Information Technology Interventions in Evidence-Based Medicine Evaluation Framework; the telehealth usability questionnaire; and the Non-Adoption, Abandonment, and Challenges to the Scale-up, Spread and Suitability framework. Results: The evaluation is expected to show that conversational technology can be used to conduct an accurate assessment and that it is acceptable to different populations with different backgrounds. In addition, the results will demonstrate how successfully the system can be delivered in organizations with different clinical pathways and how it can be integrated with their existing platforms. Conclusions: The project?s key contributions will be evidence of the effectiveness of AI voice conversational agents and their associated usability and acceptability. International Registered Report Identifier (IRRID): PRR1-10.2196/27227 UR - https://www.researchprotocols.org/2021/7/e27227 UR - http://dx.doi.org/10.2196/27227 UR - http://www.ncbi.nlm.nih.gov/pubmed/34319248 ID - info:doi/10.2196/27227 ER - TY - JOUR AU - Martinengo, Laura AU - Lo, W. Nicholas Y. AU - Goh, T. Westin I. W. AU - Tudor Car, Lorainne PY - 2021/7/21 TI - Choice of Behavioral Change Techniques in Health Care Conversational Agents: Protocol for a Scoping Review JO - JMIR Res Protoc SP - e30166 VL - 10 IS - 7 KW - behavior change KW - behavioral change technique KW - chatbot KW - conversational agent KW - health care KW - protocol KW - scoping review KW - long-term outcomes KW - behavior N2 - Background: Conversational agents or chatbots are computer programs that simulate conversations with users. Conversational agents are increasingly used for delivery of behavior change interventions in health care. Behavior change is complex and comprises the use of one or several components collectively known as behavioral change techniques (BCTs). Objective: The objective of this scoping review is to identify the BCTs that are used in behavior change?focused interventions delivered via conversational agents in health care. Methods: This scoping review will be performed in line with the Joanna Briggs Institute methodology and will be reported according to the PRISMA extension for scoping reviews guidelines. We will perform a comprehensive search of electronic databases and grey literature sources, and will check the reference lists of included studies for additional relevant studies. The screening and data extraction will be performed independently and in parallel by two review authors. Discrepancies will be resolved through consensus or discussion with a third review author. We will use a data extraction form congruent with the key themes and aims of this scoping review. BCTs employed in the included studies will be coded in line with BCT Taxonomy v1. We will analyze the data qualitatively and present it in diagrammatic or tabular form, alongside a narrative summary. Results: To date, we have designed the search strategy and performed the search on April 26, 2021. The first round of screening of retrieved articles is planned to begin soon. Conclusions: Using appropriate BCTs in the design and delivery of health care interventions via conversational agents is essential to improve long-term outcomes. Our findings will serve to inform the development of future interventions in this area. International Registered Report Identifier (IRRID): PRR1-10.2196/30166 UR - https://www.researchprotocols.org/2021/7/e30166 UR - http://dx.doi.org/10.2196/30166 UR - http://www.ncbi.nlm.nih.gov/pubmed/34287221 ID - info:doi/10.2196/30166 ER - TY - JOUR AU - Lederman, Reeva AU - D'Alfonso, Simon PY - 2021/7/20 TI - The Digital Therapeutic Alliance: Prospects and Considerations JO - JMIR Ment Health SP - e31385 VL - 8 IS - 7 KW - therapeutic alliance KW - digital therapeutic alliance KW - digital mental health KW - mental health apps KW - teletherapy KW - chatbots UR - https://mental.jmir.org/2021/7/e31385 UR - http://dx.doi.org/10.2196/31385 UR - http://www.ncbi.nlm.nih.gov/pubmed/34283035 ID - info:doi/10.2196/31385 ER - TY - JOUR AU - Rampioni, Margherita AU - Stara, Vera AU - Felici, Elisa AU - Rossi, Lorena AU - Paolini, Susy PY - 2021/7/16 TI - Embodied Conversational Agents for Patients With Dementia: Thematic Literature Analysis JO - JMIR Mhealth Uhealth SP - e25381 VL - 9 IS - 7 KW - dementia KW - patient with dementia KW - older adults with dementia KW - embodied conversational agent KW - virtual personal assistant KW - virtual agent KW - virtual companion KW - design for older adults KW - patients KW - elderly KW - virtual KW - personal assistant KW - cognitive KW - cognitive impairment N2 - Background: As the world?s population rapidly ages, the number of older adults with cognitive impairment will also increase. Several studies have identified numerous complex needs of people with dementia, which assistive technologies still fail to support. Recent trends have led to an increasing focus on the use of embodied conversational agents (ECAs) as virtual entities able to interact with a person through natural and familiar verbal and nonverbal communication. The use of ECAs could improve the accessibility and acceptance of assistive technologies matching those high-level needs that are not well covered to date. Objective: The aim of this thematic literature analysis was to map current studies in the field of designing ECAs for patients with dementia in order to identify the existing research trend and possible gaps that need to be covered in the near future. The review questions in this study were as follows: (1) what research frameworks are used to study the interaction between patients with dementia and ECAs? (2) what are the findings? and (3) what are the barriers reported in these studies? Methods: Separate literature searches were conducted in PubMed, Web of Science, Scopus, and Embase databases by using specific umbrella phrases to target the population (patients with dementia) and the technology-based intervention (embodied conversational agent). Studies that met the inclusion criteria were appraised through the Mixed Methods Appraisal Tool and then discussed in a thematic analysis. Results: The search process identified 115 records from the databases and study references. After duplicates (n=45) were removed, 70 papers remained for the initial screening. A total of 7 studies were finally included in the qualitative synthesis. A thematic analysis of the reviewed studies identified major themes and subthemes: the research frameworks used to gather users? perspectives on ECAs (theme 1), the insights shared by the 7 studies as well as the value of user involvement in the development phases and the challenge of matching the system functionalities with the users? needs (theme 2), and the main methodological and technical problems faced by each study team (theme 3). Conclusions: Our thematic literature analysis shows that the field of ECAs is novel and poorly discussed in the scientific community and that more sophisticated study designs and proofs of efficacy of the approach are required. Therefore, by analyzing the main topic of the narrative review, this study underscores the challenge of synchronizing and harmonizing knowledge, efforts, and challenges in the dementia care field and its person-centered paradigm through the user-centered design approach. Enabling strict collaboration between interdisciplinary research networks, medical scientists, technology developers, patients, and their formal and informal caregivers is still a great challenge in the field of technologies for older adults. UR - https://mhealth.jmir.org/2021/7/e25381 UR - http://dx.doi.org/10.2196/25381 UR - http://www.ncbi.nlm.nih.gov/pubmed/34269686 ID - info:doi/10.2196/25381 ER - TY - JOUR AU - Stara, Vera AU - Vera, Benjamin AU - Bolliger, Daniel AU - Rossi, Lorena AU - Felici, Elisa AU - Di Rosa, Mirko AU - de Jong, Michiel AU - Paolini, Susy PY - 2021/6/19 TI - Usability and Acceptance of the Embodied Conversational Agent Anne by People With Dementia and Their Caregivers: Exploratory Study in Home Environment Settings JO - JMIR Mhealth Uhealth SP - e25891 VL - 9 IS - 6 KW - dementia KW - older adults with dementia KW - embodied conversational agent KW - virtual personal assistant KW - virtual agent KW - virtual companion KW - design for older adults with dementia N2 - Background: Information and communication technologies are tools that are able to support cognitive functions, monitor health and movements, provide reminders to maintain residual memory abilities, and promote social support, especially among patients with dementia. Among these technologies, embodied conversational agents (ECAs) are seen as screen-based entities designed to stimulate human face-to-face conversation skills, allowing for natural human-machine interaction. Unfortunately, the evidence that such agents deliver care benefits in supporting people affected by dementia and their caregivers has not yet been well studied. Therefore, research in this area is essential for the entire scientific community. Objective: This study aims to evaluate the usability and acceptability of the virtual agent Anne by people living with dementia. The study is also designed to assess the ability of target users to use the system independently and receive valuable information from it. Methods: We conducted a 4-week trial that involved 20 older adults living with dementia and 14 family caregivers in home environment settings in Italy. This study used a mixed methods approach, balancing quantitative and qualitative instruments to gather data from users. Telemetry data were also collected. Results: Older users were particularly engaged in providing significant responses and participating in system improvements. Some of them clearly discussed how technical problems related to speech recognition had a negative impact on the intention to use, adaptiveness, usefulness, and trust. Moreover, the usability of the system achieved an encouraging score, and half of the sample recognized a role of the agent Anne. This study confirms that the quality of automatic speech recognition and synthesis is still a technical issue and has room for improvement, whereas the touch screen modality is almost stable and positively used by patients with dementia. Conclusions: This study demonstrated the ability of target users to use the system independently in their home environment; overall, the involved participants shared good engagement with the system, approaching the virtual agents as a companion able to support memory and enjoyment needs. Therefore, this research provides data that sustain the use of ECAs as future eHealth systems that are able to address the basic and higher-level needs of people living with dementia. This specific field of research is novel and poorly discussed in the scientific community. This could be because of its novelty, yet there is an urgent need to strengthen data, research, and innovation to accelerate the implementation of ECAs as a future method to offer nonpharmacological support to community-dwelling people with dementia. UR - https://mhealth.jmir.org/2021/6/e25891/ UR - http://dx.doi.org/10.2196/25891 UR - http://www.ncbi.nlm.nih.gov/pubmed/34170256 ID - info:doi/10.2196/25891 ER - TY - JOUR AU - Beilharz, Francesca AU - Sukunesan, Suku AU - Rossell, L. Susan AU - Kulkarni, Jayashri AU - Sharp, Gemma PY - 2021/6/16 TI - Development of a Positive Body Image Chatbot (KIT) With Young People and Parents/Carers: Qualitative Focus Group Study JO - J Med Internet Res SP - e27807 VL - 23 IS - 6 KW - body image KW - eating disorder KW - chatbot KW - conversational agent KW - artificial intelligence KW - mental health KW - digital health KW - design N2 - Background: Body image and eating disorders represent a significant public health concern; however, many affected individuals never access appropriate treatment. Conversational agents or chatbots reflect a unique opportunity to target those affected online by providing psychoeducation and coping skills, thus filling the gap in service provision. Objective: A world-first body image chatbot called ?KIT? was designed. The aim of this study was to assess preliminary acceptability and feasibility via the collection of qualitative feedback from young people and parents/carers regarding the content, structure, and design of the chatbot, in accordance with an agile methodology strategy. The chatbot was developed in collaboration with Australia?s national eating disorder support organization, the Butterfly Foundation. Methods: A conversation decision tree was designed that offered psychoeducational information on body image and eating disorders, as well as evidence-based coping strategies. A version of KIT was built as a research prototype to deliver these conversations. Six focus groups were conducted using online semistructured interviews to seek feedback on the KIT prototype. This included four groups of people seeking help for themselves (n=17; age 13-18 years) and two groups of parents/carers (n=8; age 46-57 years). Participants provided feedback on the cartoon chatbot character design, as well as the content, structure, and design of the chatbot webchat. Results: Thematic analyses identified the following three main themes from the six focus groups: (1) chatbot character and design, (2) content presentation, and (3) flow. Overall, the participants provided positive feedback regarding KIT, with both young people and parents/carers generally providing similar reflections. The participants approved of KIT?s character and engagement. Specific suggestions were made regarding the brevity and tone to increase KIT?s interactivity. Conclusions: Focus groups provided overall positive qualitative feedback regarding the content, structure, and design of the body image chatbot. Incorporating the feedback of lived experience from both individuals and parents/carers allowed the refinement of KIT in the development phase as per an iterative agile methodology. Further research is required to evaluate KIT?s efficacy. UR - https://www.jmir.org/2021/6/e27807 UR - http://dx.doi.org/10.2196/27807 UR - http://www.ncbi.nlm.nih.gov/pubmed/34132644 ID - info:doi/10.2196/27807 ER - TY - JOUR AU - Ruggiano, Nicole AU - Brown, L. Ellen AU - Roberts, Lisa AU - Framil Suarez, Victoria C. AU - Luo, Yan AU - Hao, Zhichao AU - Hristidis, Vagelis PY - 2021/6/3 TI - Chatbots to Support People With Dementia and Their Caregivers: Systematic Review of Functions and Quality JO - J Med Internet Res SP - e25006 VL - 23 IS - 6 KW - dementia KW - caregivers KW - chatbots KW - conversation agents KW - mobile apps KW - mobile phone N2 - Background: Over the past decade, there has been an increase in the use of information technologies to educate and support people with dementia and their family caregivers. At the same time, chatbot technologies have become increasingly popular for use by the public and have been identified as having benefits for health care delivery. However, little is known about how chatbot technologies may benefit people with dementia and their caregivers. Objective: This study aims to identify the types of current commercially available chatbots that are designed for use by people with dementia and their caregivers and to assess their quality in terms of features and content. Methods: Chatbots were identified through a systematic search on Google Play Store, Apple App Store, Alexa Skills, and the internet. An evidence-based assessment tool was used to evaluate the features and content of the identified apps. The assessment was conducted through interrater agreement among 4 separate reviewers. Results: Of the 505 initial chatbots identified, 6 were included in the review. The chatbots assessed varied significantly in terms of content and scope. Although the chatbots were generally found to be easy to use, some limitations were noted regarding their performance and programmed content for dialog. Conclusions: Although chatbot technologies are well established and commonly used by the public, their development for people with dementia and their caregivers is in its infancy. Given the successful use of chatbots in other health care settings and for other applications, there are opportunities to integrate this technology into dementia care. However, more evidence-based chatbots that have undergone end user evaluation are needed to evaluate their potential to adequately educate and support these populations. UR - https://www.jmir.org/2021/6/e25006 UR - http://dx.doi.org/10.2196/25006 UR - http://www.ncbi.nlm.nih.gov/pubmed/34081019 ID - info:doi/10.2196/25006 ER - TY - JOUR AU - Gabrielli, Silvia AU - Rizzi, Silvia AU - Bassi, Giulia AU - Carbone, Sara AU - Maimone, Rosa AU - Marchesoni, Michele AU - Forti, Stefano PY - 2021/5/28 TI - Engagement and Effectiveness of a Healthy-Coping Intervention via Chatbot for University Students During the COVID-19 Pandemic: Mixed Methods Proof-of-Concept Study JO - JMIR Mhealth Uhealth SP - e27965 VL - 9 IS - 5 KW - mobile mental health KW - chatbots KW - anxiety KW - stress KW - university students KW - digital health KW - healthy-coping intervention KW - COVID-19 N2 - Background: University students are increasingly reporting common mental health problems, such as stress, anxiety, and depression, and they frequently face barriers to seeking psychological support because of stigma, cost, and availability of mental health services. This issue is even more critical in the challenging time of the COVID-19 pandemic. Digital mental health interventions, such as those delivered via chatbots on mobile devices, offer the potential to achieve scalability of healthy-coping interventions by lowering cost and supporting prevention. Objective: The goal of this study was to conduct a proof-of-concept evaluation measuring the engagement and effectiveness of Atena, a psychoeducational chatbot supporting healthy coping with stress and anxiety, among a population of university students. Methods: In a proof-of-concept study, 71 university students were recruited during the COVID-19 pandemic; 68% (48/71) were female, they were all in their first year of university, and their mean age was 20.6 years (SD 2.4). Enrolled students were asked to use the Atena psychoeducational chatbot for 4 weeks (eight sessions; two per week), which provided healthy-coping strategies based on cognitive behavioral therapy, positive psychology, and mindfulness techniques. The intervention program consisted of conversations combined with audiovisual clips delivered via the Atena chatbot. Participants were asked to complete web-based versions of the 7-item Generalized Anxiety Disorder scale (GAD-7), the 10-item Perceived Stress Scale (PSS-10), and the Five-Facet Mindfulness Questionnaire (FFMQ) at baseline and postintervention to assess effectiveness. They were also asked to complete the User Engagement Scale?Short Form at week 2 to assess engagement with the chatbot and to provide qualitative comments on their overall experience with Atena postintervention. Results: Participants engaged with the Atena chatbot an average of 78 (SD 24.8) times over the study period. A total of 61 out of 71 (86%) participants completed the first 2 weeks of the intervention and provided data on engagement (10/71, 14% attrition). A total of 41 participants out of 71 (58%) completed the full intervention and the postintervention questionnaires (30/71, 42% attrition). Results from the completer analysis showed a significant decrease in anxiety symptoms for participants in more extreme GAD-7 score ranges (t39=0.94; P=.009) and a decrease in stress symptoms as measured by the PSS-10 (t39=2.00; P=.05) for all participants postintervention. Participants also improved significantly in the describing and nonjudging facets, based on their FFMQ subscale scores, and asked for some improvements in the user experience with the chatbot. Conclusions: This study shows the benefit of deploying a digital healthy-coping intervention via a chatbot to support university students experiencing higher levels of distress. While findings collected during the COVID-19 pandemic show promise, further research is required to confirm conclusions. UR - https://mhealth.jmir.org/2021/5/e27965 UR - http://dx.doi.org/10.2196/27965 UR - http://www.ncbi.nlm.nih.gov/pubmed/33950849 ID - info:doi/10.2196/27965 ER - TY - JOUR AU - Gross, Christoph AU - Schachner, Theresa AU - Hasl, Andrea AU - Kohlbrenner, Dario AU - Clarenbach, F. Christian AU - Wangenheim, V. Forian AU - Kowatsch, Tobias PY - 2021/5/26 TI - Personalization of Conversational Agent-Patient Interaction Styles for Chronic Disease Management: Two Consecutive Cross-sectional Questionnaire Studies JO - J Med Internet Res SP - e26643 VL - 23 IS - 5 KW - conversational agents KW - chatbots KW - human-computer interaction KW - physician-patient interaction styles KW - deliberative interaction KW - paternalistic interaction KW - digital health KW - chronic conditions KW - disease management KW - COPD KW - chronic obstructive pulmonary disease N2 - Background: Conversational agents (CAs) for chronic disease management are receiving increasing attention in academia and the industry. However, long-term adherence to CAs is still a challenge and needs to be explored. Personalization of CAs has the potential to improve long-term adherence and, with it, user satisfaction, task efficiency, perceived benefits, and intended behavior change. Research on personalized CAs has already addressed different aspects, such as personalized recommendations and anthropomorphic cues. However, detailed information on interaction styles between patients and CAs in the role of medical health care professionals is scant. Such interaction styles play essential roles for patient satisfaction, treatment adherence, and outcome, as has been shown for physician-patient interactions. Currently, it is not clear (1) whether chronically ill patients prefer a CA with a paternalistic, informative, interpretive, or deliberative interaction style, and (2) which factors influence these preferences. Objective: We aimed to investigate the preferences of chronically ill patients for CA-delivered interaction styles. Methods: We conducted two studies. The first study included a paper-based approach and explored the preferences of chronic obstructive pulmonary disease (COPD) patients for paternalistic, informative, interpretive, and deliberative CA-delivered interaction styles. Based on these results, a second study assessed the effects of the paternalistic and deliberative interaction styles on the relationship quality between the CA and patients via hierarchical multiple linear regression analyses in an online experiment with COPD patients. Patients? sociodemographic and disease-specific characteristics served as moderator variables. Results: Study 1 with 117 COPD patients revealed a preference for the deliberative (50/117) and informative (34/117) interaction styles across demographic characteristics. All patients who preferred the paternalistic style over the other interaction styles had more severe COPD (three patients, Global Initiative for Chronic Obstructive Lung Disease class 3 or 4). In Study 2 with 123 newly recruited COPD patients, younger participants and participants with a less recent COPD diagnosis scored higher on interaction-related outcomes when interacting with a CA that delivered the deliberative interaction style (interaction between age and CA type: relationship quality: b=?0.77, 95% CI ?1.37 to ?0.18; intention to continue interaction: b=?0.49, 95% CI ?0.97 to ?0.01; working alliance attachment bond: b=?0.65, 95% CI ?1.26 to ?0.04; working alliance goal agreement: b=?0.59, 95% CI ?1.18 to ?0.01; interaction between recency of COPD diagnosis and CA type: working alliance goal agreement: b=0.57, 95% CI 0.01 to 1.13). Conclusions: Our results indicate that age and a patient?s personal disease experience inform which CA interaction style the patient should be paired with to achieve increased interaction-related outcomes with the CA. These results allow the design of personalized health care CAs with the goal to increase long-term adherence to health-promoting behavior. UR - https://www.jmir.org/2021/5/e26643 UR - http://dx.doi.org/10.2196/26643 UR - http://www.ncbi.nlm.nih.gov/pubmed/33913814 ID - info:doi/10.2196/26643 ER - TY - JOUR AU - Jadczyk, Tomasz AU - Wojakowski, Wojciech AU - Tendera, Michal AU - Henry, D. Timothy AU - Egnaczyk, Gregory AU - Shreenivas, Satya PY - 2021/5/25 TI - Artificial Intelligence Can Improve Patient Management at the Time of a Pandemic: The Role of Voice Technology JO - J Med Internet Res SP - e22959 VL - 23 IS - 5 KW - artificial intelligence KW - conversational agent KW - COVID-19 KW - virtual care KW - voice assistant KW - voice chatbot UR - https://www.jmir.org/2021/5/e22959 UR - http://dx.doi.org/10.2196/22959 UR - http://www.ncbi.nlm.nih.gov/pubmed/33999834 ID - info:doi/10.2196/22959 ER - TY - JOUR AU - Munsch, Nicolas AU - Martin, Alistair AU - Gruarin, Stefanie AU - Nateqi, Jama AU - Abdarahmane, Isselmou AU - Weingartner-Ortner, Rafael AU - Knapp, Bernhard PY - 2021/5/21 TI - Authors? Reply to: Screening Tools: Their Intended Audiences and Purposes. Comment on ?Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study? JO - J Med Internet Res SP - e26543 VL - 23 IS - 5 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy UR - https://www.jmir.org/2021/5/e26543 UR - http://dx.doi.org/10.2196/26543 UR - http://www.ncbi.nlm.nih.gov/pubmed/33989162 ID - info:doi/10.2196/26543 ER - TY - JOUR AU - Millen, Elizabeth AU - Gilsdorf, Andreas AU - Fenech, Matthew AU - Gilbert, Stephen PY - 2021/5/21 TI - Screening Tools: Their Intended Audiences and Purposes. Comment on ?Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study? JO - J Med Internet Res SP - e26148 VL - 23 IS - 5 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy UR - https://www.jmir.org/2021/5/e26148 UR - http://dx.doi.org/10.2196/26148 UR - http://www.ncbi.nlm.nih.gov/pubmed/33989169 ID - info:doi/10.2196/26148 ER - TY - JOUR AU - Darcy, Alison AU - Daniels, Jade AU - Salinger, David AU - Wicks, Paul AU - Robinson, Athena PY - 2021/5/11 TI - Evidence of Human-Level Bonds Established With a Digital Conversational Agent: Cross-sectional, Retrospective Observational Study JO - JMIR Form Res SP - e27868 VL - 5 IS - 5 KW - conversational agents KW - mobile mental health KW - chatbots KW - depression KW - anxiety KW - digital health N2 - Background: There are far more patients in mental distress than there is time available for mental health professionals to support them. Although digital tools may help mitigate this issue, critics have suggested that technological solutions that lack human empathy will prevent a bond or therapeutic alliance from being formed, thereby narrowing these solutions? efficacy. Objective: We aimed to investigate whether users of a cognitive behavioral therapy (CBT)?based conversational agent would report therapeutic bond levels that are similar to those in literature about other CBT modalities, including face-to-face therapy, group CBT, and other digital interventions that do not use a conversational agent. Methods: A cross-sectional, retrospective study design was used to analyze aggregate, deidentified data from adult users who self-referred to a CBT-based, fully automated conversational agent (Woebot) between November 2019 and August 2020. Working alliance was measured with the Working Alliance Inventory-Short Revised (WAI-SR), and depression symptom status was assessed by using the 2-item Patient Health Questionnaire (PHQ-2). All measures were administered by the conversational agent in the mobile app. WAI-SR scores were compared to those in scientific literature abstracted from recent reviews. Results: Data from 36,070 Woebot users were included in the analysis. Participants ranged in age from 18 to 78 years, and 57.48% (20,734/36,070) of participants reported that they were female. The mean PHQ-2 score was 3.03 (SD 1.79), and 54.67% (19,719/36,070) of users scored over the cutoff score of 3 for depression screening. Within 5 days of initial app use, the mean WAI-SR score was 3.36 (SD 0.8) and the mean bond subscale score was 3.8 (SD 1.0), which was comparable to those in recent studies from the literature on traditional, outpatient, individual CBT and group CBT (mean bond subscale scores of 4 and 3.8, respectively). PHQ-2 scores at baseline weakly correlated with bond scores (r=?0.04; P<.001); however, users with depression and those without depression had high bond scores of 3.45. Conclusions: Although bonds are often presumed to be the exclusive domain of human therapeutic relationships, our findings challenge the notion that digital therapeutics are incapable of establishing a therapeutic bond with users. Future research might investigate the role of bonds as mediators of clinical outcomes, since boosting the engagement and efficacy of digital therapeutics could have major public health benefits. UR - https://formative.jmir.org/2021/5/e27868 UR - http://dx.doi.org/10.2196/27868 UR - http://www.ncbi.nlm.nih.gov/pubmed/33973854 ID - info:doi/10.2196/27868 ER - TY - JOUR AU - Lee, Hyeonhoon AU - Kang, Jaehyun AU - Yeo, Jonghyeon PY - 2021/5/6 TI - Medical Specialty Recommendations by an Artificial Intelligence Chatbot on a Smartphone: Development and Deployment JO - J Med Internet Res SP - e27460 VL - 23 IS - 5 KW - artificial intelligence KW - chatbot KW - COVID-19 KW - deep learning KW - deployment KW - development KW - machine learning KW - medical specialty KW - natural language processing KW - recommendation KW - smartphone N2 - Background: The COVID-19 pandemic has limited daily activities and even contact between patients and primary care providers. This makes it more difficult to provide adequate primary care services, which include connecting patients to an appropriate medical specialist. A smartphone-compatible artificial intelligence (AI) chatbot that classifies patients? symptoms and recommends the appropriate medical specialty could provide a valuable solution. Objective: In order to establish a contactless method of recommending the appropriate medical specialty, this study aimed to construct a deep learning?based natural language processing (NLP) pipeline and to develop an AI chatbot that can be used on a smartphone. Methods: We collected 118,008 sentences containing information on symptoms with labels (medical specialty), conducted data cleansing, and finally constructed a pipeline of 51,134 sentences for this study. Several deep learning models, including 4 different long short-term memory (LSTM) models with or without attention and with or without a pretrained FastText embedding layer, as well as bidirectional encoder representations from transformers for NLP, were trained and validated using a randomly selected test data set. The performance of the models was evaluated on the basis of the precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). An AI chatbot was also designed to make it easy for patients to use this specialty recommendation system. We used an open-source framework called ?Alpha? to develop our AI chatbot. This takes the form of a web-based app with a frontend chat interface capable of conversing in text and a backend cloud-based server application to handle data collection, process the data with a deep learning model, and offer the medical specialty recommendation in a responsive web that is compatible with both desktops and smartphones. Results: The bidirectional encoder representations from transformers model yielded the best performance, with an AUC of 0.964 and F1-score of 0.768, followed by LSTM model with embedding vectors, with an AUC of 0.965 and F1-score of 0.739. Considering the limitations of computing resources and the wide availability of smartphones, the LSTM model with embedding vectors trained on our data set was adopted for our AI chatbot service. We also deployed an Alpha version of the AI chatbot to be executed on both desktops and smartphones. Conclusions: With the increasing need for telemedicine during the current COVID-19 pandemic, an AI chatbot with a deep learning?based NLP model that can recommend a medical specialty to patients through their smartphones would be exceedingly useful. This chatbot allows patients to identify the proper medical specialist in a rapid and contactless manner, based on their symptoms, thus potentially supporting both patients and primary care providers. UR - https://www.jmir.org/2021/5/e27460 UR - http://dx.doi.org/10.2196/27460 UR - http://www.ncbi.nlm.nih.gov/pubmed/33882012 ID - info:doi/10.2196/27460 ER - TY - JOUR AU - Hungerbuehler, Ines AU - Daley, Kate AU - Cavanagh, Kate AU - Garcia Claro, Heloísa AU - Kapps, Michael PY - 2021/4/21 TI - Chatbot-Based Assessment of Employees? Mental Health: Design Process and Pilot Implementation JO - JMIR Form Res SP - e21678 VL - 5 IS - 4 KW - chatbot KW - conversational agent KW - online KW - digital health KW - mobile phone KW - mental health KW - workplace KW - work stress KW - survey KW - response rate N2 - Background: Stress, burnout, and mental health problems such as depression and anxiety are common, and can significantly impact workplaces through absenteeism and reduced productivity. To address this issue, organizations must first understand the extent of the difficulties by mapping the mental health of their workforce. Online surveys are a cost-effective and scalable approach to achieve this but typically have low response rates, in part due to a lack of interactivity. Chatbots offer one potential solution, enhancing engagement through simulated natural human conversation and use of interactive features. Objective: The aim of this study was to explore if a text-based chatbot is a feasible approach to engage and motivate employees to complete a workplace mental health assessment. This paper describes the design process and results of a pilot implementation. Methods: A fully automated chatbot (?Viki?) was developed to evaluate employee risks of suffering from depression, anxiety, stress, insomnia, burnout, and work-related stress. Viki uses a conversation style and gamification features to enhance engagement. A cross-sectional analysis was performed to gain first insights of a pilot implementation within a small to medium?sized enterprise (120 employees). Results: The response rate was 64.2% (77/120). In total, 98 employees started the assessment, 77 of whom (79%) completed it. The majority of participants scored in the mild range for anxiety (20/40, 50%) and depression (16/28, 57%), in the moderate range for stress (10/22, 46%), and at the subthreshold level for insomnia (14/20, 70%) as defined by their questionnaire scores. Conclusions: A chatbot-based workplace mental health assessment seems to be a highly engaging and effective way to collect anonymized mental health data among employees with response rates comparable to those of face-to-face interviews. UR - https://formative.jmir.org/2021/4/e21678 UR - http://dx.doi.org/10.2196/21678 UR - http://www.ncbi.nlm.nih.gov/pubmed/33881403 ID - info:doi/10.2196/21678 ER - TY - JOUR AU - Asensio-Cuesta, Sabina AU - Blanes-Selva, Vicent AU - Conejero, Alberto J. AU - Frigola, Ana AU - Portolés, G. Manuel AU - Merino-Torres, Francisco Juan AU - Rubio Almanza, Matilde AU - Syed-Abdul, Shabbir AU - Li, (Jack) Yu-Chuan AU - Vilar-Mateo, Ruth AU - Fernandez-Luque, Luis AU - García-Gómez, M. Juan PY - 2021/4/14 TI - A User-Centered Chatbot (Wakamola) to Collect Linked Data in Population Networks to Support Studies of Overweight and Obesity Causes: Design and Pilot Study JO - JMIR Med Inform SP - e17503 VL - 9 IS - 4 KW - mHealth KW - obesity KW - overweight KW - chatbot KW - assessment KW - public health KW - Telegram KW - user-centered design KW - Social Network Analysis N2 - Background: Obesity and overweight are a serious health problem worldwide with multiple and connected causes. Simultaneously, chatbots are becoming increasingly popular as a way to interact with users in mobile health apps. Objective: This study reports the user-centered design and feasibility study of a chatbot to collect linked data to support the study of individual and social overweight and obesity causes in populations. Methods: We first studied the users? needs and gathered users? graphical preferences through an open survey on 52 wireframes designed by 150 design students; it also included questions about sociodemographics, diet and activity habits, the need for overweight and obesity apps, and desired functionality. We also interviewed an expert panel. We then designed and developed a chatbot. Finally, we conducted a pilot study to test feasibility. Results: We collected 452 answers to the survey and interviewed 4 specialists. Based on this research, we developed a Telegram chatbot named Wakamola structured in six sections: personal, diet, physical activity, social network, user's status score, and project information. We defined a user's status score as a normalized sum (0-100) of scores about diet (frequency of eating 50 foods), physical activity, BMI, and social network. We performed a pilot to evaluate the chatbot implementation among 85 healthy volunteers. Of 74 participants who completed all sections, we found 8 underweight people (11%), 5 overweight people (7%), and no obesity cases. The mean BMI was 21.4 kg/m2 (normal weight). The most consumed foods were olive oil, milk and derivatives, cereals, vegetables, and fruits. People walked 10 minutes on 5.8 days per week, slept 7.02 hours per day, and were sitting 30.57 hours per week. Moreover, we were able to create a social network with 74 users, 178 relations, and 12 communities. Conclusions: The Telegram chatbot Wakamola is a feasible tool to collect data from a population about sociodemographics, diet patterns, physical activity, BMI, and specific diseases. Besides, the chatbot allows the connection of users in a social network to study overweight and obesity causes from both individual and social perspectives. UR - https://medinform.jmir.org/2021/4/e17503 UR - http://dx.doi.org/10.2196/17503 UR - http://www.ncbi.nlm.nih.gov/pubmed/33851934 ID - info:doi/10.2196/17503 ER - TY - JOUR AU - Bérubé, Caterina AU - Schachner, Theresa AU - Keller, Roman AU - Fleisch, Elgar AU - v Wangenheim, Florian AU - Barata, Filipe AU - Kowatsch, Tobias PY - 2021/3/29 TI - Voice-Based Conversational Agents for the Prevention and Management of Chronic and Mental Health Conditions: Systematic Literature Review JO - J Med Internet Res SP - e25933 VL - 23 IS - 3 KW - voice KW - speech KW - delivery of health care KW - noncommunicable diseases KW - conversational agents KW - mobile phone KW - smart speaker KW - monitoring KW - support KW - chronic disease KW - mental health KW - systematic literature review N2 - Background: Chronic and mental health conditions are increasingly prevalent worldwide. As devices in our everyday lives offer more and more voice-based self-service, voice-based conversational agents (VCAs) have the potential to support the prevention and management of these conditions in a scalable manner. However, evidence on VCAs dedicated to the prevention and management of chronic and mental health conditions is unclear. Objective: This study provides a better understanding of the current methods used in the evaluation of health interventions for the prevention and management of chronic and mental health conditions delivered through VCAs. Methods: We conducted a systematic literature review using PubMed MEDLINE, Embase, PsycINFO, Scopus, and Web of Science databases. We included primary research involving the prevention or management of chronic or mental health conditions through a VCA and reporting an empirical evaluation of the system either in terms of system accuracy, technology acceptance, or both. A total of 2 independent reviewers conducted the screening and data extraction, and agreement between them was measured using Cohen kappa. A narrative approach was used to synthesize the selected records. Results: Of 7170 prescreened papers, 12 met the inclusion criteria. All studies were nonexperimental. The VCAs provided behavioral support (n=5), health monitoring services (n=3), or both (n=4). The interventions were delivered via smartphones (n=5), tablets (n=2), or smart speakers (n=3). In 2 cases, no device was specified. A total of 3 VCAs targeted cancer, whereas 2 VCAs targeted diabetes and heart failure. The other VCAs targeted hearing impairment, asthma, Parkinson disease, dementia, autism, intellectual disability, and depression. The majority of the studies (n=7) assessed technology acceptance, but only few studies (n=3) used validated instruments. Half of the studies (n=6) reported either performance measures on speech recognition or on the ability of VCAs to respond to health-related queries. Only a minority of the studies (n=2) reported behavioral measures or a measure of attitudes toward intervention-targeted health behavior. Moreover, only a minority of studies (n=4) reported controlling for participants? previous experience with technology. Finally, risk bias varied markedly. Conclusions: The heterogeneity in the methods, the limited number of studies identified, and the high risk of bias show that research on VCAs for chronic and mental health conditions is still in its infancy. Although the results of system accuracy and technology acceptance are encouraging, there is still a need to establish more conclusive evidence on the efficacy of VCAs for the prevention and management of chronic and mental health conditions, both in absolute terms and in comparison with standard health care. UR - https://www.jmir.org/2021/3/e25933 UR - http://dx.doi.org/10.2196/25933 UR - http://www.ncbi.nlm.nih.gov/pubmed/33658174 ID - info:doi/10.2196/25933 ER - TY - JOUR AU - Prochaska, J. Judith AU - Vogel, A. Erin AU - Chieng, Amy AU - Kendra, Matthew AU - Baiocchi, Michael AU - Pajarito, Sarah AU - Robinson, Athena PY - 2021/3/23 TI - A Therapeutic Relational Agent for Reducing Problematic Substance Use (Woebot): Development and Usability Study JO - J Med Internet Res SP - e24850 VL - 23 IS - 3 KW - artificial intelligence KW - conversational agent KW - chatbot KW - addiction KW - substance misuse KW - treatment KW - acceptability KW - feasibility KW - craving KW - psychoeducation KW - psychotherapeutic KW - mobile phone N2 - Background: Misuse of substances is common, can be serious and costly to society, and often goes untreated due to barriers to accessing care. Woebot is a mental health digital solution informed by cognitive behavioral therapy and built upon an artificial intelligence?driven platform to deliver tailored content to users. In a previous 2-week randomized controlled trial, Woebot alleviated depressive symptoms. Objective: This study aims to adapt Woebot for the treatment of substance use disorders (W-SUDs) and examine its feasibility, acceptability, and preliminary efficacy. Methods: American adults (aged 18-65 years) who screened positive for substance misuse without major health contraindications were recruited from online sources and flyers and enrolled between March 27 and May 6, 2020. In a single-group pre/postdesign, all participants received W-SUDs for 8 weeks. W-SUDs provided mood, craving, and pain tracking and modules (psychoeducational lessons and psychotherapeutic tools) using elements of dialectical behavior therapy and motivational interviewing. Paired samples t tests and McNemar nonparametric tests were used to examine within-subject changes from pre- to posttreatment on measures of substance use, confidence, cravings, mood, and pain. Results: The sample (N=101) had a mean age of 36.8 years (SD 10.0), and 75.2% (76/101) of the participants were female, 78.2% (79/101) were non-Hispanic White, and 72.3% (73/101) were employed. Participants? W-SUDs use averaged 15.7 (SD 14.2) days, 12.1 (SD 8.3) modules, and 600.7 (SD 556.5) sent messages. About 94% (562/598) of all completed psychoeducational lessons were rated positively. From treatment start to end, in-app craving ratings were reduced by half (87/101, 86.1% reporting cravings in the app; odds ratio 0.48, 95% CI 0.32-0.73). Posttreatment assessment completion was 50.5% (51/101), with better retention among those who initially screened higher on substance misuse. From pre- to posttreatment, confidence to resist urges to use substances significantly increased (mean score change +16.9, SD 21.4; P<.001), whereas past month substance use occasions (mean change ?9.3, SD 14.1; P<.001) and scores on the Alcohol Use Disorders Identification Test-Concise (mean change ?1.3, SD 2.6; P<.001), 10-item Drug Abuse Screening Test (mean change ?1.2, SD 2.0; P<.001), Patient Health Questionnaire-8 item (mean change 2.1, SD 5.2; P=.005), Generalized Anxiety Disorder-7 (mean change ?2.3, SD 4.7; P=.001), and cravings scale (68.6% vs 47.1% moderate to extreme; P=.01) significantly decreased. Most participants would recommend W-SUDs to a friend (39/51, 76%) and reported receiving the service they desired (41/51, 80%). Fewer felt W-SUDs met most or all of their needs (22/51, 43%). Conclusions: W-SUDs was feasible to deliver, engaging, and acceptable and was associated with significant improvements in substance use, confidence, cravings, depression, and anxiety. Study attrition was high. Future research will evaluate W-SUDs in a randomized controlled trial with a more diverse sample and with the use of greater study retention strategies. Trial Registration: ClinicalTrials.gov NCT04096001; http://clinicaltrials.gov/ct2/show/NCT04096001. UR - https://www.jmir.org/2021/3/e24850 UR - http://dx.doi.org/10.2196/24850 UR - http://www.ncbi.nlm.nih.gov/pubmed/33755028 ID - info:doi/10.2196/24850 ER - TY - JOUR AU - Mariamo, Audrey AU - Temcheff, Elizabeth Caroline AU - Léger, Pierre-Majorique AU - Senecal, Sylvain AU - Lau, Alexandra Marianne PY - 2021/3/18 TI - Emotional Reactions and Likelihood of Response to Questions Designed for a Mental Health Chatbot Among Adolescents: Experimental Study JO - JMIR Hum Factors SP - e24343 VL - 8 IS - 1 KW - chatbots KW - conversational agents KW - mental health KW - well-being KW - adolescents KW - user experience KW - user preferences N2 - Background: Psychological distress increases across adolescence and has been associated with several important health outcomes with consequences that can extend into adulthood. One type of technological innovation that may serve as a unique intervention for youth experiencing psychological distress is the conversational agent, otherwise known as a chatbot. Further research is needed on the factors that may make mental health chatbots destined for adolescents more appealing and increase the likelihood that adolescents will use them. Objective: The aim of this study was to assess adolescents? emotional reactions and likelihood of responding to questions that could be posed by a mental health chatbot. Understanding adolescent preferences and factors that could increase adolescents? likelihood of responding to chatbot questions could assist in future mental health chatbot design destined for youth. Methods: We recruited 19 adolescents aged 14 to 17 years to participate in a study with a 2×2×3 within-subjects factorial design. Each participant was sequentially presented with 96 chatbot questions for a duration of 8 seconds per question. Following each presentation, participants were asked to indicate how likely they were to respond to the question, as well as their perceived affective reaction to the question. Demographic data were collected, and an informal debriefing was conducted with each participant. Results: Participants were an average of 15.3 years old (SD 1.00) and mostly female (11/19, 58%). Logistic regressions showed that the presence of GIFs predicted perceived emotional valence (?=?.40, P<.001), such that questions without GIFs were associated with a negative perceived emotional valence. Question type predicted emotional valence, such that yes/no questions (?=?.23, P=.03) and open-ended questions (?=?.26, P=.01) were associated with a negative perceived emotional valence compared to multiple response choice questions. Question type also predicted the likelihood of response, such that yes/no questions were associated with a lower likelihood of response compared to multiple response choice questions (?=?.24, P=.03) and a higher likelihood of response compared to open-ended questions (?=.54, P<.001). Conclusions: The findings of this study add to the rapidly growing field of teen-computer interaction and contribute to our understanding of adolescent user experience in their interactions with a mental health chatbot. The insights gained from this study may be of assistance to developers and designers of mental health chatbots. UR - https://humanfactors.jmir.org/2021/1/e24343 UR - http://dx.doi.org/10.2196/24343 UR - http://www.ncbi.nlm.nih.gov/pubmed/33734089 ID - info:doi/10.2196/24343 ER - TY - JOUR AU - Kataoka, Yuki AU - Takemura, Tomoyasu AU - Sasajima, Munehiko AU - Katoh, Naoki PY - 2021/3/10 TI - Development and Early Feasibility of Chatbots for Educating Patients With Lung Cancer and Their Caregivers in Japan: Mixed Methods Study JO - JMIR Cancer SP - e26911 VL - 7 IS - 1 KW - cancer KW - caregivers KW - chatbot KW - lung cancer KW - mixed methods approach KW - online health KW - patients KW - symptom management education KW - web-based platform N2 - Background: Chatbots are artificial intelligence?driven programs that interact with people. The applications of this technology include the collection and delivery of information, generation of and responding to inquiries, collection of end user feedback, and the delivery of personalized health and medical information to patients through cellphone- and web-based platforms. However, no chatbots have been developed for patients with lung cancer and their caregivers. Objective: This study aimed to develop and evaluate the early feasibility of a chatbot designed to improve the knowledge of symptom management among patients with lung cancer in Japan and their caregivers. Methods: We conducted a sequential mixed methods study that included a web-based anonymized questionnaire survey administered to physicians and paramedics from June to July 2019 (phase 1). Two physicians conducted a content analysis of the questionnaire to curate frequently asked questions (FAQs; phase 2). Based on these FAQs, we developed and integrated a chatbot into a social network service (phase 3). The physicians and paramedics involved in phase I then tested this chatbot (? test; phase 4). Thereafter, patients with lung cancer and their caregivers tested this chatbot (? test; phase 5). Results: We obtained 246 questions from 15 health care providers in phase 1. We curated 91 FAQs and their corresponding responses in phase 2. In total, 11 patients and 1 caregiver participated in the ? test in phase 5. The participants were asked 60 questions, 8 (13%) of which did not match the appropriate categories. After the ? test, 7 (64%) participants responded to the postexperimental questionnaire. The mean satisfaction score was 2.7 (SD 0.5) points out of 5. Conclusions: Medical staff providing care to patients with lung cancer can use the categories specified in this chatbot to educate patients on how they can manage their symptoms. Further studies are required to improve chatbots in terms of interaction with patients. UR - https://cancer.jmir.org/2021/1/e26911 UR - http://dx.doi.org/10.2196/26911 UR - http://www.ncbi.nlm.nih.gov/pubmed/33688839 ID - info:doi/10.2196/26911 ER - TY - JOUR AU - Chung, Kyungmi AU - Cho, Young Hee AU - Park, Young Jin PY - 2021/3/3 TI - A Chatbot for Perinatal Women?s and Partners? Obstetric and Mental Health Care: Development and Usability Evaluation Study JO - JMIR Med Inform SP - e18607 VL - 9 IS - 3 KW - chatbot KW - mobile phone KW - instant messaging KW - mobile health KW - perinatal care KW - usability KW - user experience KW - usability testing N2 - Background: To motivate people to adopt medical chatbots, the establishment of a specialized medical knowledge database that fits their personal interests is of great importance in developing a chatbot for perinatal care, particularly with the help of health professionals. Objective: The objectives of this study are to develop and evaluate a user-friendly question-and-answer (Q&A) knowledge database?based chatbot (Dr. Joy) for perinatal women?s and their partners? obstetric and mental health care by applying a text-mining technique and implementing contextual usability testing (UT), respectively, thus determining whether this medical chatbot built on mobile instant messenger (KakaoTalk) can provide its male and female users with good user experience. Methods: Two men aged 38 and 40 years and 13 women aged 27 to 43 years in pregnancy preparation or different pregnancy stages were enrolled. All participants completed the 7-day-long UT, during which they were given the daily tasks of asking Dr. Joy at least 3 questions at any time and place and then giving the chatbot either positive or negative feedback with emoji, using at least one feature of the chatbot, and finally, sending a facilitator all screenshots for the history of the day?s use via KakaoTalk before midnight. One day after the UT completion, all participants were asked to fill out a questionnaire on the evaluation of usability, perceived benefits and risks, intention to seek and share health information on the chatbot, and strengths and weaknesses of its use, as well as demographic characteristics. Results: Despite the relatively higher score of ease of learning (EOL), the results of the Spearman correlation indicated that EOL was not significantly associated with usefulness (?=0.26; P=.36), ease of use (?=0.19; P=.51), satisfaction (?=0.21; P=.46), or total usability scores (?=0.32; P=.24). Unlike EOL, all 3 subfactors and the total usability had significant positive associations with each other (all ?>0.80; P<.001). Furthermore, perceived risks exhibited no significant negative associations with perceived benefits (?=?0.29; P=.30) or intention to seek (SEE; ?=?0.28; P=.32) or share (SHA; ?=?0.24; P=.40) health information on the chatbot via KakaoTalk, whereas perceived benefits exhibited significant positive associations with both SEE and SHA. Perceived benefits were more strongly associated with SEE (?=0.94; P<.001) than with SHA (?=0.70; P=.004). Conclusions: This study provides the potential for the uptake of this newly developed Q&A knowledge database?based KakaoTalk chatbot for obstetric and mental health care. As Dr. Joy had quality contents with both utilitarian and hedonic value, its male and female users could be encouraged to use medical chatbots in a convenient, easy-to-use, and enjoyable manner. To boost their continued usage intention for Dr. Joy, its Q&A sets need to be periodically updated to satisfy user intent by monitoring both male and female user utterances. UR - https://medinform.jmir.org/2021/3/e18607 UR - http://dx.doi.org/10.2196/18607 UR - http://www.ncbi.nlm.nih.gov/pubmed/33656442 ID - info:doi/10.2196/18607 ER - TY - JOUR AU - Kowatsch, Tobias AU - Lohse, Kim-Morgaine AU - Erb, Valérie AU - Schittenhelm, Leo AU - Galliker, Helen AU - Lehner, Rea AU - Huang, M. Elaine PY - 2021/2/22 TI - Hybrid Ubiquitous Coaching With a Novel Combination of Mobile and Holographic Conversational Agents Targeting Adherence to Home Exercises: Four Design and Evaluation Studies JO - J Med Internet Res SP - e23612 VL - 23 IS - 2 KW - ubiquitous coaching KW - augmented reality KW - health care KW - treatment adherence KW - design science research KW - physiotherapy KW - chronic back pain KW - pain KW - chronic pain KW - exercise KW - adherence KW - treatment KW - conversational agent KW - smartphone KW - mobile phone N2 - Background: Effective treatments for various conditions such as obesity, cardiac heart diseases, or low back pain require not only personal on-site coaching sessions by health care experts but also a significant amount of home exercises. However, nonadherence to home exercises is still a serious problem as it leads to increased costs due to prolonged treatments. Objective: To improve adherence to home exercises, we propose, implement, and assess the novel coaching concept of hybrid ubiquitous coaching (HUC). In HUC, health care experts are complemented by a conversational agent (CA) that delivers psychoeducation and personalized motivational messages via a smartphone, as well as real-time exercise support, monitoring, and feedback in a hands-free augmented reality environment. Methods: We applied HUC to the field of physiotherapy and conducted 4 design-and-evaluate loops with an interdisciplinary team to assess how HUC is perceived by patients and physiotherapists and whether HUC leads to treatment adherence. A first version of HUC was evaluated by 35 physiotherapy patients in a lab setting to identify patients? perceptions of HUC. In addition, 11 physiotherapists were interviewed about HUC and assessed whether the CA could help them build up a working alliance with their patients. A second version was then tested by 15 patients in a within-subject experiment to identify the ability of HUC to address adherence and to build a working alliance between the patient and the CA. Finally, a 4-week n-of-1 trial was conducted with 1 patient to show one experience with HUC in depth and thereby potentially reveal real-world benefits and challenges. Results: Patients perceived HUC to be useful, easy to use, and enjoyable, preferred it to state-of-the-art approaches, and expressed their intentions to use it. Moreover, patients built a working alliance with the CA. Physiotherapists saw a relative advantage of HUC compared to current approaches but initially did not see the potential in terms of a working alliance, which changed after seeing the results of HUC in the field. Qualitative feedback from patients indicated that they enjoyed doing the exercise with an augmented reality?based CA and understood better how to do the exercise correctly with HUC. Moreover, physiotherapists highlighted that HUC would be helpful to use in the therapy process. The longitudinal field study resulted in an adherence rate of 92% (11/12 sessions; 330/360 repetitions; 33/36 sets) and a substantial increase in exercise accuracy during the 4 weeks. Conclusions: The overall positive assessments from both patients and health care experts suggest that HUC is a promising tool to be applied in various disorders with a relevant set of home exercises. Future research, however, must implement a variety of exercises and test HUC with patients suffering from different disorders. UR - https://www.jmir.org/2021/2/e23612 UR - http://dx.doi.org/10.2196/23612 UR - http://www.ncbi.nlm.nih.gov/pubmed/33461957 ID - info:doi/10.2196/23612 ER - TY - JOUR AU - Kowatsch, Tobias AU - Schachner, Theresa AU - Harperink, Samira AU - Barata, Filipe AU - Dittler, Ullrich AU - Xiao, Grace AU - Stanger, Catherine AU - v Wangenheim, Florian AU - Fleisch, Elgar AU - Oswald, Helmut AU - Möller, Alexander PY - 2021/2/17 TI - Conversational Agents as Mediating Social Actors in Chronic Disease Management Involving Health Care Professionals, Patients, and Family Members: Multisite Single-Arm Feasibility Study JO - J Med Internet Res SP - e25060 VL - 23 IS - 2 KW - digital health intervention KW - intervention design KW - mHealth KW - eHealth KW - chatbot KW - conversational agent KW - chronic diseases KW - asthma KW - feasibility study N2 - Background: Successful management of chronic diseases requires a trustful collaboration between health care professionals, patients, and family members. Scalable conversational agents, designed to assist health care professionals, may play a significant role in supporting this collaboration in a scalable way by reaching out to the everyday lives of patients and their family members. However, to date, it remains unclear whether conversational agents, in such a role, would be accepted and whether they can support this multistakeholder collaboration. Objective: With asthma in children representing a relevant target of chronic disease management, this study had the following objectives: (1) to describe the design of MAX, a conversational agent?delivered asthma intervention that supports health care professionals targeting child-parent teams in their everyday lives; and (2) to assess the (a) reach of MAX, (b) conversational agent?patient working alliance, (c) acceptance of MAX, (d) intervention completion rate, (e) cognitive and behavioral outcomes, and (f) human effort and responsiveness of health care professionals in primary and secondary care settings. Methods: MAX was designed to increase cognitive skills (ie, knowledge about asthma) and behavioral skills (ie, inhalation technique) in 10-15-year-olds with asthma, and enables support by a health professional and a family member. To this end, three design goals guided the development: (1) to build a conversational agent?patient working alliance; (2) to offer hybrid (human- and conversational agent?supported) ubiquitous coaching; and (3) to provide an intervention with high experiential value. An interdisciplinary team of computer scientists, asthma experts, and young patients with their parents developed the intervention collaboratively. The conversational agent communicates with health care professionals via email, with patients via a mobile chat app, and with a family member via SMS text messaging. A single-arm feasibility study in primary and secondary care settings was performed to assess MAX. Results: Results indicated an overall positive evaluation of MAX with respect to its reach (49.5%, 49/99 of recruited and eligible patient-family member teams participated), a strong patient-conversational agent working alliance, and high acceptance by all relevant stakeholders. Moreover, MAX led to improved cognitive and behavioral skills and an intervention completion rate of 75.5%. Family members supported the patients in 269 out of 275 (97.8%) coaching sessions. Most of the conversational turns (99.5%) were conducted between patients and the conversational agent as opposed to between patients and health care professionals, thus indicating the scalability of MAX. In addition, it took health care professionals less than 4 minutes to assess the inhalation technique and 3 days to deliver related feedback to the patients. Several suggestions for improvement were made. Conclusions: This study provides the first evidence that conversational agents, designed as mediating social actors involving health care professionals, patients, and family members, are not only accepted in such a ?team player? role but also show potential to improve health-relevant outcomes in chronic disease management. UR - http://www.jmir.org/2021/2/e25060/ UR - http://dx.doi.org/10.2196/25060 UR - http://www.ncbi.nlm.nih.gov/pubmed/33484114 ID - info:doi/10.2196/25060 ER - TY - JOUR AU - Sato, Ann AU - Haneda, Eri AU - Suganuma, Nobuyasu AU - Narimatsu, Hiroto PY - 2021/2/5 TI - Preliminary Screening for Hereditary Breast and Ovarian Cancer Using a Chatbot Augmented Intelligence Genetic Counselor: Development and Feasibility Study JO - JMIR Form Res SP - e25184 VL - 5 IS - 2 KW - artificial intelligence KW - augmented intelligence KW - hereditary cancer KW - familial cancer KW - IBM Watson KW - preliminary screening KW - cancer KW - genetics KW - chatbot KW - screening KW - feasibility N2 - Background: Breast cancer is the most common form of cancer in Japan; genetic background and hereditary breast and ovarian cancer (HBOC) are implicated. The key to HBOC diagnosis involves screening to identify high-risk individuals. However, genetic medicine is still developing; thus, many patients who may potentially benefit from genetic medicine have not yet been identified. Objective: This study?s objective is to develop a chatbot system that uses augmented intelligence for HBOC screening to determine whether patients meet the National Comprehensive Cancer Network (NCCN) BRCA1/2 testing criteria. Methods: The system was evaluated by a doctor specializing in genetic medicine and certified genetic counselors. We prepared 3 scenarios and created a conversation with the chatbot to reflect each one. Then we evaluated chatbot feasibility, the required time, the medical accuracy of conversations and family history, and the final result. Results: The times required for the conversation were 7 minutes for scenario 1, 15 minutes for scenario 2, and 16 minutes for scenario 3. Scenarios 1 and 2 met the BRCA1/2 testing criteria, but scenario 3 did not, and this result was consistent with the findings of 3 experts who retrospectively reviewed conversations with the chatbot according to the 3 scenarios. A family history comparison ascertained by the chatbot with the actual scenarios revealed that each result was consistent with each scenario. From a genetic medicine perspective, no errors were noted by the 3 experts. Conclusions: This study demonstrated that chatbot systems could be applied to preliminary genetic medicine screening for HBOC. UR - https://formative.jmir.org/2021/2/e25184 UR - http://dx.doi.org/10.2196/25184 UR - http://www.ncbi.nlm.nih.gov/pubmed/33544084 ID - info:doi/10.2196/25184 ER - TY - JOUR AU - Schachner, Theresa AU - Gross, Christoph AU - Hasl, Andrea AU - v Wangenheim, Florian AU - Kowatsch, Tobias PY - 2021/1/29 TI - Deliberative and Paternalistic Interaction Styles for Conversational Agents in Digital Health: Procedure and Validation Through a Web-Based Experiment JO - J Med Internet Res SP - e22919 VL - 23 IS - 1 KW - conversational agents KW - chatbots KW - human-computer interaction KW - physician-patient relationship KW - interaction styles, deliberative interaction KW - paternalistic interaction KW - digital health KW - chronic conditions KW - COPD N2 - Background: Recent years have witnessed a constant increase in the number of people with chronic conditions requiring ongoing medical support in their everyday lives. However, global health systems are not adequately equipped for this extraordinarily time-consuming and cost-intensive development. Here, conversational agents (CAs) can offer easily scalable and ubiquitous support. Moreover, different aspects of CAs have not yet been sufficiently investigated to fully exploit their potential. One such trait is the interaction style between patients and CAs. In human-to-human settings, the interaction style is an imperative part of the interaction between patients and physicians. Patient-physician interaction is recognized as a critical success factor for patient satisfaction, treatment adherence, and subsequent treatment outcomes. However, so far, it remains effectively unknown how different interaction styles can be implemented into CA interactions and whether these styles are recognizable by users. Objective: The objective of this study was to develop an approach to reproducibly induce 2 specific interaction styles into CA-patient dialogs and subsequently test and validate them in a chronic health care context. Methods: On the basis of the Roter Interaction Analysis System and iterative evaluations by scientific experts and medical health care professionals, we identified 10 communication components that characterize the 2 developed interaction styles: deliberative and paternalistic interaction styles. These communication components were used to develop 2 CA variations, each representing one of the 2 interaction styles. We assessed them in a web-based between-subject experiment. The participants were asked to put themselves in the position of a patient with chronic obstructive pulmonary disease. These participants were randomly assigned to interact with one of the 2 CAs and subsequently asked to identify the respective interaction style. Chi-square test was used to assess the correct identification of the CA-patient interaction style. Results: A total of 88 individuals (42/88, 48% female; mean age 31.5 years, SD 10.1 years) fulfilled the inclusion criteria and participated in the web-based experiment. The participants in both the paternalistic and deliberative conditions correctly identified the underlying interaction styles of the CAs in more than 80% of the assessments (X21,88=38.2; P<.001; phi coefficient r?=0.68). The validation of the procedure was hence successful. Conclusions: We developed an approach that is tailored for a medical context to induce a paternalistic and deliberative interaction style into a written interaction between a patient and a CA. We successfully tested and validated the procedure in a web-based experiment involving 88 participants. Future research should implement and test this approach among actual patients with chronic diseases and compare the results in different medical conditions. This approach can further be used as a starting point to develop dynamic CAs that adapt their interaction styles to their users. UR - http://www.jmir.org/2021/1/e22919/ UR - http://dx.doi.org/10.2196/22919 UR - http://www.ncbi.nlm.nih.gov/pubmed/33512328 ID - info:doi/10.2196/22919 ER - TY - JOUR AU - Abd-Alrazaq, A. Alaa AU - Alajlani, Mohannad AU - Ali, Nashva AU - Denecke, Kerstin AU - Bewick, M. Bridgette AU - Househ, Mowafa PY - 2021/1/13 TI - Perceptions and Opinions of Patients About Mental Health Chatbots: Scoping Review JO - J Med Internet Res SP - e17828 VL - 23 IS - 1 KW - chatbots KW - conversational agents KW - mental health KW - mental disorders KW - perceptions KW - opinions KW - mobile phone N2 - Background: Chatbots have been used in the last decade to improve access to mental health care services. Perceptions and opinions of patients influence the adoption of chatbots for health care. Many studies have been conducted to assess the perceptions and opinions of patients about mental health chatbots. To the best of our knowledge, there has been no review of the evidence surrounding perceptions and opinions of patients about mental health chatbots. Objective: This study aims to conduct a scoping review of the perceptions and opinions of patients about chatbots for mental health. Methods: The scoping review was carried out in line with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for scoping reviews guidelines. Studies were identified by searching 8 electronic databases (eg, MEDLINE and Embase) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. In total, 2 reviewers independently selected studies and extracted data from the included studies. Data were synthesized using thematic analysis. Results: Of 1072 citations retrieved, 37 unique studies were included in the review. The thematic analysis generated 10 themes from the findings of the studies: usefulness, ease of use, responsiveness, understandability, acceptability, attractiveness, trustworthiness, enjoyability, content, and comparisons. Conclusions: The results demonstrated overall positive perceptions and opinions of patients about chatbots for mental health. Important issues to be addressed in the future are the linguistic capabilities of the chatbots: they have to be able to deal adequately with unexpected user input, provide high-quality responses, and have to show high variability in responses. To be useful for clinical practice, we have to find ways to harmonize chatbot content with individual treatment recommendations, that is, a personalization of chatbot conversations is required. UR - http://www.jmir.org/2021/1/e17828/ UR - http://dx.doi.org/10.2196/17828 UR - http://www.ncbi.nlm.nih.gov/pubmed/33439133 ID - info:doi/10.2196/17828 ER - TY - JOUR AU - Leung, W. Yvonne AU - Wouterloot, Elise AU - Adikari, Achini AU - Hirst, Graeme AU - de Silva, Daswin AU - Wong, Jiahui AU - Bender, L. Jacqueline AU - Gancarz, Mathew AU - Gratzer, David AU - Alahakoon, Damminda AU - Esplen, Jane Mary PY - 2021/1/7 TI - Natural Language Processing?Based Virtual Cofacilitator for Online Cancer Support Groups: Protocol for an Algorithm Development and Validation Study JO - JMIR Res Protoc SP - e21453 VL - 10 IS - 1 KW - artificial intelligence KW - cancer KW - online support groups KW - emotional distress KW - natural language processing KW - participant engagement N2 - Background: Cancer and its treatment can significantly impact the short- and long-term psychological well-being of patients and families. Emotional distress and depressive symptomatology are often associated with poor treatment adherence, reduced quality of life, and higher mortality. Cancer support groups, especially those led by health care professionals, provide a safe place for participants to discuss fear, normalize stress reactions, share solidarity, and learn about effective strategies to build resilience and enhance coping. However, in-person support groups may not always be accessible to individuals; geographic distance is one of the barriers for access, and compromised physical condition (eg, fatigue, pain) is another. Emerging evidence supports the effectiveness of online support groups in reducing access barriers. Text-based and professional-led online support groups have been offered by Cancer Chat Canada. Participants join the group discussion using text in real time. However, therapist leaders report some challenges leading text-based online support groups in the absence of visual cues, particularly in tracking participant distress. With multiple participants typing at the same time, the nuances of the text messages or red flags for distress can sometimes be missed. Recent advances in artificial intelligence such as deep learning?based natural language processing offer potential solutions. This technology can be used to analyze online support group text data to track participants? expressed emotional distress, including fear, sadness, and hopelessness. Artificial intelligence allows session activities to be monitored in real time and alerts the therapist to participant disengagement. Objective: We aim to develop and evaluate an artificial intelligence?based cofacilitator prototype to track and monitor online support group participants? distress through real-time analysis of text-based messages posted during synchronous sessions. Methods: An artificial intelligence?based cofacilitator will be developed to identify participants who are at-risk for increased emotional distress and track participant engagement and in-session group cohesion levels, providing real-time alerts for therapist to follow-up; generate postsession participant profiles that contain discussion content keywords and emotion profiles for each session; and automatically suggest tailored resources to participants according to their needs. The study is designed to be conducted in 4 phases consisting of (1) development based on a subset of data and an existing natural language processing framework, (2) performance evaluation using human scoring, (3) beta testing, and (4) user experience evaluation. Results: This study received ethics approval in August 2019. Phase 1, development of an artificial intelligence?based cofacilitator, was completed in January 2020. As of December 2020, phase 2 is underway. The study is expected to be completed by September 2021. Conclusions: An artificial intelligence?based cofacilitator offers a promising new mode of delivery of person-centered online support groups tailored to individual needs. International Registered Report Identifier (IRRID): DERR1-10.2196/21453 UR - https://www.researchprotocols.org/2021/1/e21453 UR - http://dx.doi.org/10.2196/21453 UR - http://www.ncbi.nlm.nih.gov/pubmed/33410754 ID - info:doi/10.2196/21453 ER - TY - JOUR AU - Fan, Xiangmin AU - Chao, Daren AU - Zhang, Zhan AU - Wang, Dakuo AU - Li, Xiaohua AU - Tian, Feng PY - 2021/1/6 TI - Utilization of Self-Diagnosis Health Chatbots in Real-World Settings: Case Study JO - J Med Internet Res SP - e19928 VL - 23 IS - 1 KW - self-diagnosis KW - chatbot KW - conversational agent KW - human?artificial intelligence interaction KW - artificial intelligence KW - diagnosis KW - case study KW - eHealth KW - real world KW - user experience N2 - Background: Artificial intelligence (AI)-driven chatbots are increasingly being used in health care, but most chatbots are designed for a specific population and evaluated in controlled settings. There is little research documenting how health consumers (eg, patients and caregivers) use chatbots for self-diagnosis purposes in real-world scenarios. Objective: The aim of this research was to understand how health chatbots are used in a real-world context, what issues and barriers exist in their usage, and how the user experience of this novel technology can be improved. Methods: We employed a data-driven approach to analyze the system log of a widely deployed self-diagnosis chatbot in China. Our data set consisted of 47,684 consultation sessions initiated by 16,519 users over 6 months. The log data included a variety of information, including users? nonidentifiable demographic information, consultation details, diagnostic reports, and user feedback. We conducted both statistical analysis and content analysis on this heterogeneous data set. Results: The chatbot users spanned all age groups, including middle-aged and older adults. Users consulted the chatbot on a wide range of medical conditions, including those that often entail considerable privacy and social stigma issues. Furthermore, we distilled 2 prominent issues in the use of the chatbot: (1) a considerable number of users dropped out in the middle of their consultation sessions, and (2) some users pretended to have health concerns and used the chatbot for nontherapeutic purposes. Finally, we identified a set of user concerns regarding the use of the chatbot, including insufficient actionable information and perceived inaccurate diagnostic suggestions. Conclusions: Although health chatbots are considered to be convenient tools for enhancing patient-centered care, there are issues and barriers impeding the optimal use of this novel technology. Designers and developers should employ user-centered approaches to address the issues and user concerns to achieve the best uptake and utilization. We conclude the paper by discussing several design implications, including making the chatbots more informative, easy-to-use, and trustworthy, as well as improving the onboarding experience to enhance user engagement. UR - https://www.jmir.org/2021/1/e19928 UR - http://dx.doi.org/10.2196/19928 UR - http://www.ncbi.nlm.nih.gov/pubmed/33404508 ID - info:doi/10.2196/19928 ER - TY - JOUR AU - Kramer, L. Lean AU - Mulder, C. Bob AU - van Velsen, Lex AU - de Vet, Emely PY - 2021/1/6 TI - Use and Effect of Web-Based Embodied Conversational Agents for Improving Eating Behavior and Decreasing Loneliness Among Community-Dwelling Older Adults: Protocol for a Randomized Controlled Trial JO - JMIR Res Protoc SP - e22186 VL - 10 IS - 1 KW - embodied conversational agent KW - health behavior change KW - loneliness KW - eating behavior KW - older adults N2 - Background: An unhealthy eating pattern and loneliness negatively influence quality of life in older age. Embodied conversational agents (ECAs) are a promising way to address these health behaviors in an engaging manner. Objective: We aim to (1) identify whether ECAs can persuade community-dwelling older adults to change their dietary behavior and whether ECA use can decrease loneliness, (2) test these pathways to effects, and (3) understand the use of an ECA. Methods: The web-based eHealth app PACO is a fully automated 8-week intervention in which 2 ECAs engage older adults in dialogue to motivate them to change their dietary behavior and decrease their loneliness. PACO was developed via a human-centered and stakeholder-inclusive design approach and incorporates Self-determination Theory and various behavior change techniques. For this study, an unblinded randomized controlled trial will be performed. There will be 2 cohorts, with 30 participants per cohort. Participants in the first cohort will immediately receive the PACO app for 8 weeks, while participants in the second cohort receive the PACO app after a waiting-list condition of 4 weeks. Participants will be recruited via social media, an online panel, flyers, and advertorials. To be eligible, participants must be at least 65 years of age, must not be in paid employment, and must live alone independently at home. Primary outcomes will be self-assessed via online questionnaires at intake, control, after 4 weeks, and after 8 weeks, and will include eating behavior and loneliness. In addition, the primary outcome?use?will be measured via data logs. Secondary outcomes will be measured at the same junctures, via either validated, self-assessed, online questionnaires or an optional interview. Results: As of July 2020, we have begun recruiting participants. Conclusions: By unraveling the mechanisms behind the use of a web-based intervention with ECAs, we hope to gain a fine-grained understanding of both the effectiveness and the use of ECAs in the health context. Trial Registration: ClinicalTrials.gov NCT04510883; https://clinicaltrials.gov/ct2/show/NCT04510883 International Registered Report Identifier (IRRID): PRR1-10.2196/22186 UR - https://www.researchprotocols.org/2021/1/e22186 UR - http://dx.doi.org/10.2196/22186 UR - http://www.ncbi.nlm.nih.gov/pubmed/33404513 ID - info:doi/10.2196/22186 ER - TY - JOUR AU - Safi, Zeineb AU - Abd-Alrazaq, Alaa AU - Khalifa, Mohamed AU - Househ, Mowafa PY - 2020/12/18 TI - Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review JO - J Med Internet Res SP - e19127 VL - 22 IS - 12 KW - chatbots KW - conversational agents KW - medical applications KW - scoping review KW - technical aspects N2 - Background: Chatbots are applications that can conduct natural language conversations with users. In the medical field, chatbots have been developed and used to serve different purposes. They provide patients with timely information that can be critical in some scenarios, such as access to mental health resources. Since the development of the first chatbot, ELIZA, in the late 1960s, much effort has followed to produce chatbots for various health purposes developed in different ways. Objective: This study aimed to explore the technical aspects and development methodologies associated with chatbots used in the medical field to explain the best methods of development and support chatbot development researchers on their future work. Methods: We searched for relevant articles in 8 literature databases (IEEE, ACM, Springer, ScienceDirect, Embase, MEDLINE, PsycINFO, and Google Scholar). We also performed forward and backward reference checking of the selected articles. Study selection was performed by one reviewer, and 50% of the selected studies were randomly checked by a second reviewer. A narrative approach was used for result synthesis. Chatbots were classified based on the different technical aspects of their development. The main chatbot components were identified in addition to the different techniques for implementing each module. Results: The original search returned 2481 publications, of which we identified 45 studies that matched our inclusion and exclusion criteria. The most common language of communication between users and chatbots was English (n=23). We identified 4 main modules: text understanding module, dialog management module, database layer, and text generation module. The most common technique for developing text understanding and dialogue management is the pattern matching method (n=18 and n=25, respectively). The most common text generation is fixed output (n=36). Very few studies relied on generating original output. Most studies kept a medical knowledge base to be used by the chatbot for different purposes throughout the conversations. A few studies kept conversation scripts and collected user data and previous conversations. Conclusions: Many chatbots have been developed for medical use, at an increasing rate. There is a recent, apparent shift in adopting machine learning?based approaches for developing chatbot systems. Further research can be conducted to link clinical outcomes to different chatbot development techniques and technical characteristics. UR - http://www.jmir.org/2020/12/e19127/ UR - http://dx.doi.org/10.2196/19127 UR - http://www.ncbi.nlm.nih.gov/pubmed/33337337 ID - info:doi/10.2196/19127 ER - TY - JOUR AU - Kowalska, Ma?gorzata AU - G?ady?, Aleksandra AU - Kala?ska-?ukasik, Barbara AU - Gruz-Kwapisz, Monika AU - Wojakowski, Wojciech AU - Jadczyk, Tomasz PY - 2020/12/17 TI - Readiness for Voice Technology in Patients With Cardiovascular Diseases: Cross-Sectional Study JO - J Med Internet Res SP - e20456 VL - 22 IS - 12 KW - voice technology KW - smart speaker KW - acceptance KW - telehealth KW - cardiovascular diseases KW - chatbot N2 - Background: The clinical application of voice technology provides novel opportunities in the field of telehealth. However, patients? readiness for this solution has not been investigated among patients with cardiovascular diseases (CVD). Objective: This paper aims to evaluate patients? anticipated experiences regarding telemedicine, including voice conversational agents combined with provider-driven support delivered by phone. Methods: A cross-sectional study enrolled patients with chronic CVD who were surveyed using a validated investigator-designed questionnaire combining 19 questions (eg, demographic data, medical history, preferences for using telehealth services). Prior to the survey, respondents were educated on the telemedicine services presented in the questionnaire while being assisted by a medical doctor. Responses were then collected and analyzed, and multivariate logistic regression was used to identify predictors of willingness to use voice technology. Results: In total, 249 patients (mean age 65.3, SD 13.8 years; 158 [63.5%] men) completed the questionnaire, which showed good repeatability in the validation procedure. Of the 249 total participants, 209 (83.9%) reported high readiness to receive services allowing for remote contact with a cardiologist (176/249, 70.7%) and telemonitoring of vital signs (168/249, 67.5%). The voice conversational agents combined with provider-driven support delivered by phone were shown to be highly anticipated by patients with CVD. The readiness to use telehealth was statistically higher in people with previous difficulties accessing health care (OR 2.920, 95% CI 1.377-6.192) and was most frequent in city residents and individuals reporting a higher education level. The age and sex of the respondents did not impact the intention to use voice technology (P=.20 and P=.50, respectively). Conclusions: Patients with cardiovascular diseases, including both younger and older individuals, declared high readiness for voice technology. UR - http://www.jmir.org/2020/12/e20456/ UR - http://dx.doi.org/10.2196/20456 UR - http://www.ncbi.nlm.nih.gov/pubmed/33331824 ID - info:doi/10.2196/20456 ER - TY - JOUR AU - te Pas, E. Mariska AU - Rutten, M. Werner G. M. AU - Bouwman, Arthur R. AU - Buise, P. Marc PY - 2020/12/7 TI - User Experience of a Chatbot Questionnaire Versus a Regular Computer Questionnaire: Prospective Comparative Study JO - JMIR Med Inform SP - e21982 VL - 8 IS - 12 KW - chatbot KW - user experience KW - questionnaires KW - response rates KW - value-based health care N2 - Background: Respondent engagement of questionnaires in health care is fundamental to ensure adequate response rates for the evaluation of services and quality of care. Conventional survey designs are often perceived as dull and unengaging, resulting in negative respondent behavior. It is necessary to make completing a questionnaire attractive and motivating. Objective: The aim of this study is to compare the user experience of a chatbot questionnaire, which mimics intelligent conversation, with a regular computer questionnaire. Methods: The research took place at the preoperative outpatient clinic. Patients completed both the standard computer questionnaire and the new chatbot questionnaire. Afterward, patients gave their feedback on both questionnaires by the User Experience Questionnaire, which consists of 26 terms to score. Results: The mean age of the 40 included patients (25 [63%] women) was 49 (SD 18-79) years; 46.73% (486/1040) of all terms were scored positive for the chatbot. Patients preferred the computer for 7.98% (83/1040) of the terms and for 47.88% (498/1040) of the terms there were no differences. Completion (mean time) of the computer questionnaire took 9.00 minutes by men (SD 2.72) and 7.72 minutes by women (SD 2.60; P=.148). For the chatbot, completion by men took 8.33 minutes (SD 2.99) and by women 7.36 minutes (SD 2.61; P=.287). Conclusions: Patients preferred the chatbot questionnaire over the computer questionnaire. Time to completion of both questionnaires did not differ, though the chatbot questionnaire on a tablet felt more rapid compared to the computer questionnaire. This is an important finding because it could lead to higher response rates and to qualitatively better responses in future questionnaires. UR - http://medinform.jmir.org/2020/12/e21982/ UR - http://dx.doi.org/10.2196/21982 UR - http://www.ncbi.nlm.nih.gov/pubmed/33284125 ID - info:doi/10.2196/21982 ER - TY - JOUR AU - Ferré, Fabrice AU - Boeschlin, Nicolas AU - Bastiani, Bruno AU - Castel, Adeline AU - Ferrier, Anne AU - Bosch, Laetitia AU - Muscari, Fabrice AU - Kurrek, Matt AU - Fourcade, Olivier AU - Piau, Antoine AU - Minville, Vincent PY - 2020/12/4 TI - Improving Provision of Preanesthetic Information Through Use of the Digital Conversational Agent ?MyAnesth?: Prospective Observational Trial JO - J Med Internet Res SP - e20455 VL - 22 IS - 12 KW - chatbot KW - digital conversational agent KW - preanesthetic consultation KW - Abric method KW - eHealth KW - digital health KW - anesthesia N2 - Background: Due to time limitations, the preanesthetic consultation (PAC) is not the best time for patients to integrate information specific to their perioperative care pathway. Objective: The main objectives of this study were to evaluate the effectiveness of a digital companion on patients' knowledge of anesthesia and their satisfaction after real-life implementation. Methods: We conducted a prospective, monocentric, comparative study using a before-and-after design. In phase 1, a 9-item self-reported anesthesia knowledge test (Delphi method) was administered to patients before and after their PAC (control group: PAC group). In phase 2, the study was repeated immediately after the implementation of a digital conversational agent, MyAnesth (@+PAC group). Patients? satisfaction and their representations for anesthesia were also assessed using a Likert scale and the Abric method of hierarchized evocation. Results: A total of 600 tests were distributed; 205 patients and 98 patients were included in the PAC group and @+PAC group, respectively. Demographic characteristics and mean scores on the 9-point preinformation test (PAC group: 4.2 points, 95% CI 3.9-4.4; @+PAC: 4.3 points, 95% CI 4-4.7; P=.37) were similar in the two groups. The mean score after receiving information was better in the @+PAC group than in the PAC group (6.1 points, 95% CI 5.8-6.4 points versus 5.2 points, 95% CI 5.0-5.4 points, respectively; P<.001), with an added value of 0.7 points (95% CI 0.3-1.1; P<.001). Among the respondents in the @+PAC group, 82% found the information to be clear and appropriate, and 74% found it easily accessible. Before receiving information, the central core of patients? representations for anesthesia was focused on the fear of being put to sleep and thereafter on caregiver skills and comfort. Conclusions: The implementation of our digital conversational agent in addition to the PAC improved patients' knowledge about their perioperative care pathway. This innovative audiovisual support seemed clear, adapted, easily accessible, and reassuring. Future studies should focus on adapting both the content and delivery of a digital conversational agent for the PAC in order to maximize its benefit to patients. UR - https://www.jmir.org/2020/12/e20455 UR - http://dx.doi.org/10.2196/20455 UR - http://www.ncbi.nlm.nih.gov/pubmed/33275108 ID - info:doi/10.2196/20455 ER - TY - JOUR AU - Morse, E. Keith AU - Ostberg, P. Nicolai AU - Jones, G. Veena AU - Chan, S. Albert PY - 2020/11/30 TI - Use Characteristics and Triage Acuity of a Digital Symptom Checker in a Large Integrated Health System: Population-Based Descriptive Study JO - J Med Internet Res SP - e20549 VL - 22 IS - 11 KW - symptom checker KW - chatbot KW - computer-assisted diagnosis KW - diagnostic self-evaluation KW - artificial intelligence KW - self-care KW - COVID-19 N2 - Background: Pressure on the US health care system has been increasing due to a combination of aging populations, rising health care expenditures, and most recently, the COVID-19 pandemic. Responses to this pressure are hindered in part by reliance on a limited supply of highly trained health care professionals, creating a need for scalable technological solutions. Digital symptom checkers are artificial intelligence?supported software tools that use a conversational ?chatbot? format to support rapid diagnosis and consistent triage. The COVID-19 pandemic has brought new attention to these tools due to the need to avoid face-to-face contact and preserve urgent care capacity. However, evidence-based deployment of these chatbots requires an understanding of user demographics and associated triage recommendations generated by a large general population. Objective: In this study, we evaluate the user demographics and levels of triage acuity provided by a symptom checker chatbot deployed in partnership with a large integrated health system in the United States. Methods: This population-based descriptive study included all web-based symptom assessments completed on the website and patient portal of the Sutter Health system (24 hospitals in Northern California) from April 24, 2019, to February 1, 2020. User demographics were compared to relevant US Census population data. Results: A total of 26,646 symptom assessments were completed during the study period. Most assessments (17,816/26,646, 66.9%) were completed by female users. The mean user age was 34.3 years (SD 14.4 years), compared to a median age of 37.3 years of the general population. The most common initial symptom was abdominal pain (2060/26,646, 7.7%). A substantial number of assessments (12,357/26,646, 46.4%) were completed outside of typical physician office hours. Most users were advised to seek medical care on the same day (7299/26,646, 27.4%) or within 2-3 days (6301/26,646, 23.6%). Over a quarter of the assessments indicated a high degree of urgency (7723/26,646, 29.0%). Conclusions: Users of the symptom checker chatbot were broadly representative of our patient population, although they skewed toward younger and female users. The triage recommendations were comparable to those of nurse-staffed telephone triage lines. Although the emergence of COVID-19 has increased the interest in remote medical assessment tools, it is important to take an evidence-based approach to their deployment. UR - https://www.jmir.org/2020/11/e20549 UR - http://dx.doi.org/10.2196/20549 UR - http://www.ncbi.nlm.nih.gov/pubmed/33170799 ID - info:doi/10.2196/20549 ER - TY - JOUR AU - Dosovitsky, Gilly AU - Pineda, S. Blanca AU - Jacobson, C. Nicholas AU - Chang, Cyrus AU - Escoredo, Milagros AU - Bunge, L. Eduardo PY - 2020/11/13 TI - Artificial Intelligence Chatbot for Depression: Descriptive Study of Usage JO - JMIR Form Res SP - e17065 VL - 4 IS - 11 KW - chatbot KW - artificial intelligence KW - depression KW - mobile health KW - telehealth N2 - Background: Chatbots could be a scalable solution that provides an interactive means of engaging users in behavioral health interventions driven by artificial intelligence. Although some chatbots have shown promising early efficacy results, there is limited information about how people use these chatbots. Understanding the usage patterns of chatbots for depression represents a crucial step toward improving chatbot design and providing information about the strengths and limitations of the chatbots. Objective: This study aims to understand how users engage and are redirected through a chatbot for depression (Tess) to provide design recommendations. Methods: Interactions of 354 users with the Tess depression modules were analyzed to understand chatbot usage across and within modules. Descriptive statistics were used to analyze participant flow through each depression module, including characters per message, completion rate, and time spent per module. Slide plots were also used to analyze the flow across and within modules. Results: Users sent a total of 6220 messages, with a total of 86,298 characters, and, on average, they engaged with Tess depression modules for 46 days. There was large heterogeneity in user engagement across different modules, which appeared to be affected by the length, complexity, content, and style of questions within the modules and the routing between modules. Conclusions: Overall, participants engaged with Tess; however, there was a heterogeneous usage pattern because of varying module designs. Major implications for future chatbot design and evaluation are discussed in the paper. UR - http://formative.jmir.org/2020/11/e17065/ UR - http://dx.doi.org/10.2196/17065 UR - http://www.ncbi.nlm.nih.gov/pubmed/33185563 ID - info:doi/10.2196/17065 ER - TY - JOUR AU - Koman, Jason AU - Fauvelle, Khristina AU - Schuck, Stéphane AU - Texier, Nathalie AU - Mebarki, Adel PY - 2020/11/10 TI - Physicians? Perceptions of the Use of a Chatbot for Information Seeking: Qualitative Study JO - J Med Internet Res SP - e15185 VL - 22 IS - 11 KW - health KW - digital health KW - innovation KW - conversational agent KW - decision support system KW - qualitative research KW - chatbot KW - bot KW - medical drugs KW - prescription KW - risk minimization measures N2 - Background: Seeking medical information can be an issue for physicians. In the specific context of medical practice, chatbots are hypothesized to present additional value for providing information quickly, particularly as far as drug risk minimization measures are concerned. Objective: This qualitative study aimed to elicit physicians? perceptions of a pilot version of a chatbot used in the context of drug information and risk minimization measures. Methods: General practitioners and specialists were recruited across France to participate in individual semistructured interviews. Interviews were recorded, transcribed, and analyzed using a horizontal thematic analysis approach. Results: Eight general practitioners and 2 specialists participated. The tone and ergonomics of the pilot version were appreciated by physicians. However, all participants emphasized the importance of getting exhaustive, trustworthy answers when interacting with a chatbot. Conclusions: The chatbot was perceived as a useful and innovative tool that could easily be integrated into routine medical practice and could help health professionals when seeking information on drug and risk minimization measures. UR - http://www.jmir.org/2020/11/e15185/ UR - http://dx.doi.org/10.2196/15185 UR - http://www.ncbi.nlm.nih.gov/pubmed/33170134 ID - info:doi/10.2196/15185 ER - TY - JOUR AU - Gong, Enying AU - Baptista, Shaira AU - Russell, Anthony AU - Scuffham, Paul AU - Riddell, Michaela AU - Speight, Jane AU - Bird, Dominique AU - Williams, Emily AU - Lotfaliany, Mojtaba AU - Oldenburg, Brian PY - 2020/11/5 TI - My Diabetes Coach, a Mobile App?Based Interactive Conversational Agent to Support Type 2 Diabetes Self-Management: Randomized Effectiveness-Implementation Trial JO - J Med Internet Res SP - e20322 VL - 22 IS - 11 KW - type 2 diabetes mellitus KW - self-management KW - health-related quality of life KW - digital technology KW - coaching KW - mobile phone N2 - Background: Delivering self-management support to people with type 2 diabetes mellitus is essential to reduce the health system burden and to empower people with the skills, knowledge, and confidence needed to take an active role in managing their own health. Objective: This study aims to evaluate the adoption, use, and effectiveness of the My Diabetes Coach (MDC) program, an app-based interactive embodied conversational agent, Laura, designed to support diabetes self-management in the home setting over 12 months. Methods: This randomized controlled trial evaluated both the implementation and effectiveness of the MDC program. Adults with type 2 diabetes in Australia were recruited and randomized to the intervention arm (MDC) or the control arm (usual care). Program use was tracked over 12 months. Coprimary outcomes included changes in glycated hemoglobin (HbA1c) and health-related quality of life (HRQoL). Data were assessed at baseline and at 6 and 12 months, and analyzed using linear mixed-effects regression models. Results: A total of 187 adults with type 2 diabetes (mean 57 years, SD 10 years; 41.7% women) were recruited and randomly allocated to the intervention (n=93) and control (n=94) arms. MDC program users (92/93 participants) completed 1942 chats with Laura, averaging 243 min (SD 212) per person over 12 months. Compared with baseline, the mean estimated HbA1c decreased in both arms at 12 months (intervention: 0.33% and control: 0.20%), but the net differences between the two arms in change of HbA1c (?0.04%, 95% CI ?0.45 to 0.36; P=.83) was not statistically significant. At 12 months, HRQoL utility scores improved in the intervention arm, compared with the control arm (between-arm difference: 0.04, 95% CI 0.00 to 0.07; P=.04). Conclusions: The MDC program was successfully adopted and used by individuals with type 2 diabetes and significantly improved the users? HRQoL. These findings suggest the potential for wider implementation of technology-enabled conversation-based programs for supporting diabetes self-management. Future studies should focus on strategies to maintain program usage and HbA1c improvement. Trial Registration: Australia New Zealand Clinical Trials Registry (ACTRN) 12614001229662; https://anzctr.org.au/Trial/Registration/TrialReview.aspx?ACTRN=12614001229662 UR - https://www.jmir.org/2020/11/e20322 UR - http://dx.doi.org/10.2196/20322 UR - http://www.ncbi.nlm.nih.gov/pubmed/33151154 ID - info:doi/10.2196/20322 ER - TY - JOUR AU - Almusharraf, Fahad AU - Rose, Jonathan AU - Selby, Peter PY - 2020/11/3 TI - Engaging Unmotivated Smokers to Move Toward Quitting: Design of Motivational Interviewing?Based Chatbot Through Iterative Interactions JO - J Med Internet Res SP - e20251 VL - 22 IS - 11 KW - smoking cessation KW - motivational interviewing KW - chatbot KW - natural language processing N2 - Background: At any given time, most smokers in a population are ambivalent with no motivation to quit. Motivational interviewing (MI) is an evidence-based technique that aims to elicit change in ambivalent smokers. MI practitioners are scarce and expensive, and smokers are difficult to reach. Smokers are potentially reachable through the web, and if an automated chatbot could emulate an MI conversation, it could form the basis of a low-cost and scalable intervention motivating smokers to quit. Objective: The primary goal of this study is to design, train, and test an automated MI-based chatbot capable of eliciting reflection in a conversation with cigarette smokers. This study describes the process of collecting training data to improve the chatbot?s ability to generate MI-oriented responses, particularly reflections and summary statements. The secondary goal of this study is to observe the effects on participants through voluntary feedback given after completing a conversation with the chatbot. Methods: An interdisciplinary collaboration between an MI expert and experts in computer engineering and natural language processing (NLP) co-designed the conversation and algorithms underlying the chatbot. A sample of 121 adult cigarette smokers in 11 successive groups were recruited from a web-based platform for a single-arm prospective iterative design study. The chatbot was designed to stimulate reflections on the pros and cons of smoking using MI?s running head start technique. Participants were also asked to confirm the chatbot?s classification of their free-form responses to measure the classification accuracy of the underlying NLP models. Each group provided responses that were used to train the chatbot for the next group. Results: A total of 6568 responses from 121 participants in 11 successive groups over 14 weeks were received. From these responses, we were able to isolate 21 unique reasons for and against smoking and the relative frequency of each. The gradual collection of responses as inputs and smoking reasons as labels over the 11 iterations improved the F1 score of the classification within the chatbot from 0.63 in the first group to 0.82 in the final group. The mean time spent by each participant interacting with the chatbot was 21.3 (SD 14.0) min (minimum 6.4 and maximum 89.2). We also found that 34.7% (42/121) of participants enjoyed the interaction with the chatbot, and 8.3% (10/121) of participants noted explicit smoking cessation benefits from the conversation in voluntary feedback that did not solicit this explicitly. Conclusions: Recruiting ambivalent smokers through the web is a viable method to train a chatbot to increase accuracy in reflection and summary statements, the building blocks of MI. A new set of 21 smoking reasons (both for and against) has been identified. Initial feedback from smokers on the experience shows promise toward using it in an intervention. UR - https://www.jmir.org/2020/11/e20251 UR - http://dx.doi.org/10.2196/20251 UR - http://www.ncbi.nlm.nih.gov/pubmed/33141095 ID - info:doi/10.2196/20251 ER - TY - JOUR AU - Milne-Ives, Madison AU - de Cock, Caroline AU - Lim, Ernest AU - Shehadeh, Harper Melissa AU - de Pennington, Nick AU - Mole, Guy AU - Normando, Eduardo AU - Meinert, Edward PY - 2020/10/22 TI - The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review JO - J Med Internet Res SP - e20346 VL - 22 IS - 10 KW - artificial intelligence KW - avatar KW - chatbot KW - conversational agent KW - digital health KW - intelligent assistant KW - speech recognition software KW - virtual assistant KW - virtual coach KW - virtual health care KW - virtual nursing KW - voice recognition software N2 - Background: The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption. Objective: This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents. Methods: PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another. Results: A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback. Conclusions: The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents. International Registered Report Identifier (IRRID): RR2-10.2196/16934 UR - http://www.jmir.org/2020/10/e20346/ UR - http://dx.doi.org/10.2196/20346 UR - http://www.ncbi.nlm.nih.gov/pubmed/33090118 ID - info:doi/10.2196/20346 ER - TY - JOUR AU - Munsch, Nicolas AU - Martin, Alistair AU - Gruarin, Stefanie AU - Nateqi, Jama AU - Abdarahmane, Isselmou AU - Weingartner-Ortner, Rafael AU - Knapp, Bernhard PY - 2020/10/6 TI - Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study JO - J Med Internet Res SP - e21299 VL - 22 IS - 10 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy N2 - Background: A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective: The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods: We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non?COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results: The classification task between COVID-19?positive and COVID-19?negative for ?high risk? cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For ?high risk? and ?medium risk? combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions: We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers. UR - http://www.jmir.org/2020/10/e21299/ UR - http://dx.doi.org/10.2196/21299 UR - http://www.ncbi.nlm.nih.gov/pubmed/33001828 ID - info:doi/10.2196/21299 ER - TY - JOUR AU - Zhang, Jingwen AU - Oh, Jung Yoo AU - Lange, Patrick AU - Yu, Zhou AU - Fukuoka, Yoshimi PY - 2020/9/30 TI - Artificial Intelligence Chatbot Behavior Change Model for Designing Artificial Intelligence Chatbots to Promote Physical Activity and a Healthy Diet: Viewpoint JO - J Med Internet Res SP - e22845 VL - 22 IS - 9 KW - chatbot KW - conversational agent KW - artificial intelligence KW - physical activity KW - diet KW - intervention KW - behavior change KW - natural language processing KW - communication N2 - Background: Chatbots empowered by artificial intelligence (AI) can increasingly engage in natural conversations and build relationships with users. Applying AI chatbots to lifestyle modification programs is one of the promising areas to develop cost-effective and feasible behavior interventions to promote physical activity and a healthy diet. Objective: The purposes of this perspective paper are to present a brief literature review of chatbot use in promoting physical activity and a healthy diet, describe the AI chatbot behavior change model our research team developed based on extensive interdisciplinary research, and discuss ethical principles and considerations. Methods: We conducted a preliminary search of studies reporting chatbots for improving physical activity and/or diet in four databases in July 2020. We summarized the characteristics of the chatbot studies and reviewed recent developments in human-AI communication research and innovations in natural language processing. Based on the identified gaps and opportunities, as well as our own clinical and research experience and findings, we propose an AI chatbot behavior change model. Results: Our review found a lack of understanding around theoretical guidance and practical recommendations on designing AI chatbots for lifestyle modification programs. The proposed AI chatbot behavior change model consists of the following four components to provide such guidance: (1) designing chatbot characteristics and understanding user background; (2) building relational capacity; (3) building persuasive conversational capacity; and (4) evaluating mechanisms and outcomes. The rationale and evidence supporting the design and evaluation choices for this model are presented in this paper. Conclusions: As AI chatbots become increasingly integrated into various digital communications, our proposed theoretical framework is the first step to conceptualize the scope of utilization in health behavior change domains and to synthesize all possible dimensions of chatbot features to inform intervention design and evaluation. There is a need for more interdisciplinary work to continue developing AI techniques to improve a chatbot?s relational and persuasive capacities to change physical activity and diet behaviors with strong ethical principles. UR - https://www.jmir.org/2020/9/e22845 UR - http://dx.doi.org/10.2196/22845 UR - http://www.ncbi.nlm.nih.gov/pubmed/32996892 ID - info:doi/10.2196/22845 ER - TY - JOUR AU - Li, Juan AU - Maharjan, Bikesh AU - Xie, Bo AU - Tao, Cui PY - 2020/9/21 TI - A Personalized Voice-Based Diet Assistant for Caregivers of Alzheimer Disease and Related Dementias: System Development and Validation JO - J Med Internet Res SP - e19897 VL - 22 IS - 9 KW - Alzheimer disease KW - dementia KW - diet KW - knowledge KW - ontology KW - voice assistant N2 - Background: The world?s aging population is increasing, with an expected increase in the prevalence of Alzheimer disease and related dementias (ADRD). Proper nutrition and good eating behavior show promise for preventing and slowing the progression of ADRD and consequently improving patients with ADRD?s health status and quality of life. Most ADRD care is provided by informal caregivers, so assisting caregivers to manage patients with ADRD?s diet is important. Objective: This study aims to design, develop, and test an artificial intelligence?powered voice assistant to help informal caregivers manage the daily diet of patients with ADRD and learn food and nutrition-related knowledge. Methods: The voice assistant is being implemented in several steps: construction of a comprehensive knowledge base with ontologies that define ADRD diet care and user profiles, and is extended with external knowledge graphs; management of conversation between users and the voice assistant; personalized ADRD diet services provided through a semantics-based knowledge graph search and reasoning engine; and system evaluation in use cases with additional qualitative evaluations. Results: A prototype voice assistant was evaluated in the lab using various use cases. Preliminary qualitative test results demonstrate reasonable rates of dialogue success and recommendation correctness. Conclusions: The voice assistant provides a natural, interactive interface for users, and it does not require the user to have a technical background, which may facilitate senior caregivers? use in their daily care tasks. This study suggests the feasibility of using the intelligent voice assistant to help caregivers manage patients with ADRD?s diet. UR - http://www.jmir.org/2020/9/e19897/ UR - http://dx.doi.org/10.2196/19897 UR - http://www.ncbi.nlm.nih.gov/pubmed/32955452 ID - info:doi/10.2196/19897 ER - TY - JOUR AU - Schachner, Theresa AU - Keller, Roman AU - v Wangenheim, Florian PY - 2020/9/14 TI - Artificial Intelligence-Based Conversational Agents for Chronic Conditions: Systematic Literature Review JO - J Med Internet Res SP - e20701 VL - 22 IS - 9 KW - artificial intelligence KW - conversational agents KW - chatbots KW - healthcare KW - chronic diseases KW - systematic literature review N2 - Background: A rising number of conversational agents or chatbots are equipped with artificial intelligence (AI) architecture. They are increasingly prevalent in health care applications such as those providing education and support to patients with chronic diseases, one of the leading causes of death in the 21st century. AI-based chatbots enable more effective and frequent interactions with such patients. Objective: The goal of this systematic literature review is to review the characteristics, health care conditions, and AI architectures of AI-based conversational agents designed specifically for chronic diseases. Methods: We conducted a systematic literature review using PubMed MEDLINE, EMBASE, PyscInfo, CINAHL, ACM Digital Library, ScienceDirect, and Web of Science. We applied a predefined search strategy using the terms ?conversational agent,? ?healthcare,? ?artificial intelligence,? and their synonyms. We updated the search results using Google alerts, and screened reference lists for other relevant articles. We included primary research studies that involved the prevention, treatment, or rehabilitation of chronic diseases, involved a conversational agent, and included any kind of AI architecture. Two independent reviewers conducted screening and data extraction, and Cohen kappa was used to measure interrater agreement.A narrative approach was applied for data synthesis. Results: The literature search found 2052 articles, out of which 10 papers met the inclusion criteria. The small number of identified studies together with the prevalence of quasi-experimental studies (n=7) and prevailing prototype nature of the chatbots (n=7) revealed the immaturity of the field. The reported chatbots addressed a broad variety of chronic diseases (n=6), showcasing a tendency to develop specialized conversational agents for individual chronic conditions. However, there lacks comparison of these chatbots within and between chronic diseases. In addition, the reported evaluation measures were not standardized, and the addressed health goals showed a large range. Together, these study characteristics complicated comparability and open room for future research. While natural language processing represented the most used AI technique (n=7) and the majority of conversational agents allowed for multimodal interaction (n=6), the identified studies demonstrated broad heterogeneity, lack of depth of reported AI techniques and systems, and inconsistent usage of taxonomy of the underlying AI software, further aggravating comparability and generalizability of study results. Conclusions: The literature on AI-based conversational agents for chronic conditions is scarce and mostly consists of quasi-experimental studies with chatbots in prototype stage that use natural language processing and allow for multimodal user interaction. Future research could profit from evidence-based evaluation of the AI-based conversational agents and comparison thereof within and between different chronic health conditions. Besides increased comparability, the quality of chatbots developed for specific chronic conditions and their subsequent impact on the target patients could be enhanced by more structured development and standardized evaluation processes. UR - http://www.jmir.org/2020/9/e20701/ UR - http://dx.doi.org/10.2196/20701 UR - http://www.ncbi.nlm.nih.gov/pubmed/32924957 ID - info:doi/10.2196/20701 ER - TY - JOUR AU - ter Stal, Silke AU - Broekhuis, Marijke AU - van Velsen, Lex AU - Hermens, Hermie AU - Tabak, Monique PY - 2020/9/4 TI - Embodied Conversational Agent Appearance for Health Assessment of Older Adults: Explorative Study JO - JMIR Hum Factors SP - e19987 VL - 7 IS - 3 KW - embodied conversational agent KW - appearance design KW - health status assessment KW - older adults KW - eHealth N2 - Background: Embodied conversational agents (ECAs) have great potential for health apps but are rarely investigated as part of such apps. To promote the uptake of health apps, we need to understand how the design of ECAs can influence the preferences, motivation, and behavior of users. Objective: This is one of the first studies that investigates how the appearance of an ECA implemented within a health app affects users? likeliness of following agent advice, their perception of agent characteristics, and their feeling of rapport. In addition, we assessed usability and intention to use. Methods: The ECA was implemented within a frailty assessment app in which three health questionnaires were translated into agent dialogues. In a within-subject experiment, questionnaire dialogues were randomly offered by a young female agent or an older male agent. Participants were asked to think aloud during interaction. Afterward, they rated the likeliness of following the agent?s advice, agent characteristics, rapport, usability, and intention to use and participated in a semistructured interview. Results: A total of 20 older adults (72.2 [SD 3.5] years) participated. The older male agent was perceived as more authoritative than the young female agent (P=.03), but no other differences were found. The app scored high on usability (median 6.1) and intention to use (median 6.0). Participants indicated they did not see an added value of the agent to the health app. Conclusions: Agent age and gender little influence users? impressions after short interaction but remain important at first glance to lower the threshold to interact with the agent. Thus, it is important to take the design of ECAs into account when implementing them into health apps. UR - https://humanfactors.jmir.org/2020/3/e19987 UR - http://dx.doi.org/10.2196/19987 UR - http://www.ncbi.nlm.nih.gov/pubmed/32886068 ID - info:doi/10.2196/19987 ER - TY - JOUR AU - Zhang, Melvyn AU - Smith, Elizabeth Helen PY - 2020/8/21 TI - Digital Tools to Ameliorate Psychological Symptoms Associated With COVID-19: Scoping Review JO - J Med Internet Res SP - e19706 VL - 22 IS - 8 KW - COVID-19 KW - digital tool KW - psychiatry KW - mental health KW - digital health KW - psychology KW - distress KW - stress KW - anxiety KW - depression N2 - Background: In the four months after the discovery of the index case of coronavirus disease (COVID-19), several studies highlighted the psychological impact of COVID-19 on frontline health care workers and on members of the general public. It is evident from these studies that individuals experienced elevated levels of anxiety and depression in the acute phase, when they first became aware of the pandemic, and that the psychological distress persisted into subsequent weeks. It is becoming apparent that technological tools such as SMS text messages, web-based interventions, mobile interventions, and conversational agents can help ameliorate psychological distress in the workplace and in society. To our knowledge, there are few publications describing how digital tools have been used to ameliorate psychological symptoms among individuals. Objective: The aim of this review was to identify existing SMS text message, web-based, mobile, and conversational agents that the general public can access to ameliorate the psychological symptoms they are experiencing during the COVID-19 pandemic. Methods: To identify digital tools that were published specifically for COVID-19, a search was performed in the PubMed and MEDLINE databases from the inception of the databases through June 17, 2020. The following search strings were used: ?NCOV OR 2019-nCoV OR SARS-CoV-2 OR Coronavirus OR COVID19 OR COVID? and ?mHealth OR eHealth OR text?. Another search was conducted in PubMed and MEDLINE to identify existing digital tools for depression and anxiety disorders. A web-based search engine (Google) was used to identify if the cited web-based interventions could be accessed. A mobile app search engine, App Annie, was used to determine if the identified mobile apps were commercially available. Results: A total of 6 studies were identified. Of the 6 identified web-based interventions, 5 websites (83%) could be accessed. Of the 32 identified mobile interventions, 7 apps (22%) could be accessed. Of the 7 identified conversational agents, only 2 (29%) could be accessed. Results: A total of 6 studies were identified. Of the 6 identified web-based interventions, 5 websites (83%) could be accessed. Of the 32 identified mobile interventions, 7 apps (22%) could be accessed. Of the 7 identified conversational agents, only 2 (29%) could be accessed. Conclusions: The COVID-19 pandemic has caused significant psychological distress. Digital tools that are commercially available may be useful for at-risk individuals or individuals with pre-existing psychiatric symptoms. UR - http://www.jmir.org/2020/8/e19706/ UR - http://dx.doi.org/10.2196/19706 UR - http://www.ncbi.nlm.nih.gov/pubmed/32721922 ID - info:doi/10.2196/19706 ER - TY - JOUR AU - Bray, Lucy AU - Sharpe, Ashley AU - Gichuru, Phillip AU - Fortune, Peter-Marc AU - Blake, Lucy AU - Appleton, Victoria PY - 2020/8/11 TI - The Acceptability and Impact of the Xploro Digital Therapeutic Platform to Inform and Prepare Children for Planned Procedures in a Hospital: Before and After Evaluation Study JO - J Med Internet Res SP - e17367 VL - 22 IS - 8 KW - health literacy KW - augmented reality KW - children KW - procedure KW - health KW - artificial intelligence N2 - Background: There is increasing interest in finding novel approaches to improve the preparation of children for hospital procedures such as surgery, x-rays, and blood tests. Well-prepared and informed children have better outcomes (less procedural anxiety and higher satisfaction). A digital therapeutic (DTx) platform (Xploro) was developed with children to provide health information through gamification, serious games, a chatbot, and an augmented reality avatar. Objective: This before and after evaluation study aims to assess the acceptability of the Xploro DTx and examine its impact on children and their parent?s procedural knowledge, procedural anxiety, and reported experiences when attending a hospital for a planned procedure. Methods: We used a mixed methods design with quantitative measures and qualitative data collected sequentially from a group of children who received standard hospital information (before group) and a group of children who received the DTx intervention (after group). Participants were children aged between 8 and 14 years and their parents who attended a hospital for a planned clinical procedure at a children?s hospital in North West England. Children and their parents completed self-report measures (perceived knowledge, procedural anxiety, procedural satisfaction, and procedural involvement) at baseline, preprocedure, and postprocedure. Results: A total of 80 children (n=40 standard care group and n=40 intervention group) and their parents participated in the study; the children were aged between 8 and 14 years (average 10.4, SD 2.27 years) and were attending a hospital for a range of procedures. The children in the intervention group reported significantly lower levels of procedural anxiety before the procedure than those in the standard group (two-tailed t63.64=2.740; P=.008). The children in the intervention group also felt more involved in their procedure than those in the standard group (t75=?2.238; P=.03). The children in the intervention group also reported significantly higher levels of perceived procedural knowledge preprocedure (t59.98=?4.892; P=.001) than those in the standard group. As for parents, those with access to the Xploro intervention reported significantly lower levels of procedural anxiety preprocedure than those who did not (t68.51=1.985; P=.05). During the semistructured write and tell interviews, children stated that they enjoyed using the intervention, it was fun and easy to use, and they felt that it had positively influenced their experiences of coming to the hospital for a procedure. Conclusions: This study has shown that the DTx platform, Xploro, has a positive impact on children attending a hospital for a procedure by reducing levels of procedural anxiety. The children and parents in the intervention group described Xploro as improving their experiences and being easy and fun to use. UR - http://www.jmir.org/2020/8/e17367/ UR - http://dx.doi.org/10.2196/17367 UR - http://www.ncbi.nlm.nih.gov/pubmed/32780025 ID - info:doi/10.2196/17367 ER - TY - JOUR AU - Tudor Car, Lorainne AU - Dhinagaran, Ardhithy Dhakshenya AU - Kyaw, Myint Bhone AU - Kowatsch, Tobias AU - Joty, Shafiq AU - Theng, Yin-Leng AU - Atun, Rifat PY - 2020/8/7 TI - Conversational Agents in Health Care: Scoping Review and Conceptual Analysis JO - J Med Internet Res SP - e17158 VL - 22 IS - 8 KW - conversational agents KW - chatbots KW - artificial intelligence KW - machine learning KW - mobile phone KW - health care KW - scoping review N2 - Background: Conversational agents, also known as chatbots, are computer programs designed to simulate human text or verbal conversations. They are increasingly used in a range of fields, including health care. By enabling better accessibility, personalization, and efficiency, conversational agents have the potential to improve patient care. Objective: This study aimed to review the current applications, gaps, and challenges in the literature on conversational agents in health care and provide recommendations for their future research, design, and application. Methods: We performed a scoping review. A broad literature search was performed in MEDLINE (Medical Literature Analysis and Retrieval System Online; Ovid), EMBASE (Excerpta Medica database; Ovid), PubMed, Scopus, and Cochrane Central with the search terms ?conversational agents,? ?conversational AI,? ?chatbots,? and associated synonyms. We also searched the gray literature using sources such as the OCLC (Online Computer Library Center) WorldCat database and ResearchGate in April 2019. Reference lists of relevant articles were checked for further articles. Screening and data extraction were performed in parallel by 2 reviewers. The included evidence was analyzed narratively by employing the principles of thematic analysis. Results: The literature search yielded 47 study reports (45 articles and 2 ongoing clinical trials) that matched the inclusion criteria. The identified conversational agents were largely delivered via smartphone apps (n=23) and used free text only as the main input (n=19) and output (n=30) modality. Case studies describing chatbot development (n=18) were the most prevalent, and only 11 randomized controlled trials were identified. The 3 most commonly reported conversational agent applications in the literature were treatment and monitoring, health care service support, and patient education. Conclusions: The literature on conversational agents in health care is largely descriptive and aimed at treatment and monitoring and health service support. It mostly reports on text-based, artificial intelligence?driven, and smartphone app?delivered conversational agents. There is an urgent need for a robust evaluation of diverse health care conversational agents? formats, focusing on their acceptability, safety, and effectiveness. UR - http://www.jmir.org/2020/8/e17158/ UR - http://dx.doi.org/10.2196/17158 UR - http://www.ncbi.nlm.nih.gov/pubmed/32763886 ID - info:doi/10.2196/17158 ER - TY - JOUR AU - Ferrand, John AU - Hockensmith, Ryli AU - Houghton, Fagen Rebecca AU - Walsh-Buhi, R. Eric PY - 2020/8/3 TI - Evaluating Smart Assistant Responses for Accuracy and Misinformation Regarding Human Papillomavirus Vaccination: Content Analysis Study JO - J Med Internet Res SP - e19018 VL - 22 IS - 8 KW - digital health KW - human papillomavirus KW - smart assistants KW - chatbots KW - conversational agents KW - misinformation KW - infodemiology KW - vaccination N2 - Background: Almost half (46%) of Americans have used a smart assistant of some kind (eg, Apple Siri), and 25% have used a stand-alone smart assistant (eg, Amazon Echo). This positions smart assistants as potentially useful modalities for retrieving health-related information; however, the accuracy of smart assistant responses lacks rigorous evaluation. Objective: This study aimed to evaluate the levels of accuracy, misinformation, and sentiment in smart assistant responses to human papillomavirus (HPV) vaccination?related questions. Methods: We systematically examined responses to questions about the HPV vaccine from the following four most popular smart assistants: Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana. One team member posed 10 questions to each smart assistant and recorded all queries and responses. Two raters independently coded all responses (?=0.85). We then assessed differences among the smart assistants in terms of response accuracy, presence of misinformation, and sentiment regarding the HPV vaccine. Results: A total of 103 responses were obtained from the 10 questions posed across the smart assistants. Google Assistant data were excluded owing to nonresponse. Over half (n=63, 61%) of the responses of the remaining three smart assistants were accurate. We found statistically significant differences across the smart assistants (N=103, ?22=7.807, P=.02), with Cortana yielding the greatest proportion of misinformation. Siri yielded the greatest proportion of accurate responses (n=26, 72%), whereas Cortana yielded the lowest proportion of accurate responses (n=33, 54%). Most response sentiments across smart assistants were positive (n=65, 64%) or neutral (n=18, 18%), but Cortana?s responses yielded the largest proportion of negative sentiment (n=7, 12%). Conclusions: Smart assistants appear to be average-quality sources for HPV vaccination information, with Alexa responding most reliably. Cortana returned the largest proportion of inaccurate responses, the most misinformation, and the greatest proportion of results with negative sentiments. More collaboration between technology companies and public health entities is necessary to improve the retrieval of accurate health information via smart assistants. UR - https://www.jmir.org/2020/8/e19018 UR - http://dx.doi.org/10.2196/19018 UR - http://www.ncbi.nlm.nih.gov/pubmed/32744508 ID - info:doi/10.2196/19018 ER - TY - JOUR AU - Chattopadhyay, Debaleena AU - Ma, Tengteng AU - Sharifi, Hasti AU - Martyn-Nemeth, Pamela PY - 2020/7/30 TI - Computer-Controlled Virtual Humans in Patient-Facing Systems: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e18839 VL - 22 IS - 7 KW - virtual humans KW - avatars KW - patient-facing systems KW - meta-analysis KW - conversational agents KW - chatbot KW - digital interlocutors N2 - Background: Virtual humans (VH) are computer-generated characters that appear humanlike and simulate face-to-face conversations using verbal and nonverbal cues. Unlike formless conversational agents, like smart speakers or chatbots, VH bring together the capabilities of both a conversational agent and an interactive avatar (computer-represented digital characters). Although their use in patient-facing systems has garnered substantial interest, it is unknown to what extent VH are effective in health applications. Objective: The purpose of this review was to examine the effectiveness of VH in patient-facing systems. The design and implementation characteristics of these systems were also examined. Methods: Electronic bibliographic databases were searched for peer-reviewed articles with relevant key terms. Studies were included in the systematic review if they designed or evaluated VH in patient-facing systems. Of the included studies, studies that used a randomized controlled trial to evaluate VH were included in the meta-analysis; they were then summarized using the PICOTS framework (population, intervention, comparison group, outcomes, time frame, setting). Summary effect sizes, using random-effects models, were calculated, and the risk of bias was assessed. Results: Among the 8,125 unique records identified, 53 articles describing 33 unique systems, were qualitatively, systematically reviewed. Two distinct design categories emerged ? simple VH and VH augmented with health sensors and trackers. Of the 53 articles, 16 (26 studies) with 44 primary and 22 secondary outcomes were included in the meta-analysis. Meta-analysis of the 44 primary outcome measures revealed a significant difference between intervention and control conditions, favoring the VH intervention (SMD = .166, 95% CI .039-.292, P=.012), but with evidence of some heterogeneity, I2=49.3%. There were more cross-sectional (k=15) than longitudinal studies (k=11). The intervention was delivered using a personal computer in most studies (k=18), followed by a tablet (k=4), mobile kiosk (k=2), head-mounted display (k=1), and a desktop computer in a community center (k=1). Conclusions: We offer evidence for the efficacy of VH in patient-facing systems. Considering that studies included different population and outcome types, more focused analysis is needed in the future. Future studies also need to identify what features of virtual human interventions contribute toward their effectiveness. UR - http://www.jmir.org/2020/7/e18839/ UR - http://dx.doi.org/10.2196/18839 UR - http://www.ncbi.nlm.nih.gov/pubmed/32729837 ID - info:doi/10.2196/18839 ER - TY - JOUR AU - Anthony, A. Chris AU - Rojas, Octavio Edward AU - Keffala, Valerie AU - Glass, Ann Natalie AU - Shah, S. Apurva AU - Miller, J. Benjamin AU - Hogue, Matthew AU - Willey, C. Michael AU - Karam, Matthew AU - Marsh, Lawrence John PY - 2020/7/29 TI - Acceptance and Commitment Therapy Delivered via a Mobile Phone Messaging Robot to Decrease Postoperative Opioid Use in Patients With Orthopedic Trauma: Randomized Controlled Trial JO - J Med Internet Res SP - e17750 VL - 22 IS - 7 KW - acceptance and commitment therapy KW - opioid crisis KW - patient-reported outcome measures KW - postoperative pain KW - orthopedics KW - text messaging KW - chatbot KW - conversational agents KW - mHealth N2 - Background: Acceptance and commitment therapy (ACT) is a pragmatic approach to help individuals decrease avoidable pain. Objective: This study aims to evaluate the effects of ACT delivered via an automated mobile messaging robot on postoperative opioid use and patient-reported outcomes (PROs) in patients with orthopedic trauma who underwent operative intervention for their injuries. Methods: Adult patients presenting to a level 1 trauma center who underwent operative fixation of a traumatic upper or lower extremity fracture and who used mobile phone text messaging were eligible for the study. Patients were randomized in a 1:1 ratio to either the intervention group, who received twice-daily mobile phone messages communicating an ACT-based intervention for the first 2 weeks after surgery, or the control group, who received no messages. Baseline PROs were completed. Two weeks after the operative intervention, follow-up was performed in the form of an opioid medication pill count and postoperative administration of PROs. The mean number of opioid tablets used by patients was calculated and compared between groups. The mean PRO scores were also compared between the groups. Results: A total of 82 subjects were enrolled in the study. Of the 82 participants, 76 (38 ACT and 38 controls) completed the study. No differences between groups in demographic factors were identified. The intervention group used an average of 26.1 (SD 21.4) opioid tablets, whereas the control group used 41.1 (SD 22.0) tablets, resulting in 36.5% ([41.1-26.1]/41.1) less tablets used by subjects receiving the mobile phone?based ACT intervention (P=.004). The intervention group subjects reported a lower postoperative Patient-Reported Outcome Measure Information System Pain Intensity score (mean 45.9, SD 7.2) than control group subjects (mean 49.7, SD 8.8; P=.04). Conclusions: In this study, the delivery of an ACT-based intervention via an automated mobile messaging robot in the acute postoperative period decreased opioid use in selected patients with orthopedic trauma. Participants receiving the ACT-based intervention also reported lower pain intensity after 2 weeks, although this may not represent a clinically important difference. Trial Registration: ClinicalTrials.gov NCT03991546; https://clinicaltrials.gov/ct2/show/NCT03991546 UR - https://www.jmir.org/2020/7/e17750 UR - http://dx.doi.org/10.2196/17750 UR - http://www.ncbi.nlm.nih.gov/pubmed/32723723 ID - info:doi/10.2196/17750 ER - TY - JOUR AU - Baptista, Shaira AU - Wadley, Greg AU - Bird, Dominique AU - Oldenburg, Brian AU - Speight, Jane AU - PY - 2020/7/22 TI - Acceptability of an Embodied Conversational Agent for Type 2 Diabetes Self-Management Education and Support via a Smartphone App: Mixed Methods Study JO - JMIR Mhealth Uhealth SP - e17038 VL - 8 IS - 7 KW - embodied conversational agent KW - type 2 diabetes KW - mobile apps KW - mHealth KW - smartphone KW - self-management KW - mobile phone N2 - Background: Embodied conversational agents (ECAs) are increasingly used in health care apps; however, their acceptability in type 2 diabetes (T2D) self-management apps has not yet been investigated. Objective: This study aimed to evaluate the acceptability of the ECA (Laura) used to deliver diabetes self-management education and support in the My Diabetes Coach (MDC) app. Methods: A sequential mixed methods design was applied. Adults with T2D allocated to the intervention arm of the MDC trial used the MDC app over a period of 12 months. At 6 months, they completed questions assessing their interaction with, and attitudes toward, the ECA. In-depth qualitative interviews were conducted with a subsample of the participants from the intervention arm to explore their experiences of using the ECA. The interview questions included the participants? perceptions of Laura, including their initial impression of her (and how this changed over time), her personality, and human character. The quantitative and qualitative data were interpreted using integrated synthesis. Results: Of the 93 intervention participants, 44 (47%) were women; the mean (SD) age of the participants was 55 (SD 10) years and the baseline glycated hemoglobin A1c level was 7.3% (SD 1.5%). Overall, 66 of the 93 participants (71%) provided survey responses. Of these, most described Laura as being helpful (57/66, 86%), friendly (57/66, 86%), competent (56/66, 85%), trustworthy (48/66, 73%), and likable (40/66, 61%). Some described Laura as not real (18/66, 27%), boring (26/66, 39%), and annoying (20/66, 30%). Participants reported that interacting with Laura made them feel more motivated (29/66, 44%), comfortable (24/66, 36%), confident (14/66, 21%), happy (11/66, 17%), and hopeful (8/66, 12%). Furthermore, 20% (13/66) of the participants were frustrated by their interaction with Laura, and 17% (11/66) of the participants reported that interacting with Laura made them feel guilty. A total of 4 themes emerged from the qualitative data (N=19): (1) perceived role: a friendly coach rather than a health professional; (2) perceived support: emotional and motivational support; (3) embodiment preference acceptability of a human-like character; and (4) room for improvement: need for greater congruence between Laura?s words and actions. Conclusions: These findings suggest that an ECA is an acceptable means to deliver T2D self-management education and support. A human-like character providing ongoing, friendly, nonjudgmental, emotional, and motivational support is well received. Nevertheless, the ECA can be improved by increasing congruence between its verbal and nonverbal communication and accommodating user preferences. Trial Registration: Australian New Zealand Clinical Trials Registry CTRN12614001229662; https://tinyurl.com/yxshn6pd UR - https://mhealth.jmir.org/2020/7/e17038 UR - http://dx.doi.org/10.2196/17038 UR - http://www.ncbi.nlm.nih.gov/pubmed/32706734 ID - info:doi/10.2196/17038 ER - TY - JOUR AU - Abd-Alrazaq, Ali Alaa AU - Rababeh, Asma AU - Alajlani, Mohannad AU - Bewick, M. Bridgette AU - Househ, Mowafa PY - 2020/7/13 TI - Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e16021 VL - 22 IS - 7 KW - chatbots KW - conversational agents KW - mental health KW - mental disorders KW - depression KW - anxiety KW - effectiveness KW - safety N2 - Background: The global shortage of mental health workers has prompted the utilization of technological advancements, such as chatbots, to meet the needs of people with mental health conditions. Chatbots are systems that are able to converse and interact with human users using spoken, written, and visual language. While numerous studies have assessed the effectiveness and safety of using chatbots in mental health, no reviews have pooled the results of those studies. Objective: This study aimed to assess the effectiveness and safety of using chatbots to improve mental health through summarizing and pooling the results of previous studies. Methods: A systematic review was carried out to achieve this objective. The search sources were 7 bibliographic databases (eg, MEDLINE, EMBASE, PsycINFO), the search engine ?Google Scholar,? and backward and forward reference list checking of the included studies and relevant reviews. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias. Data extracted from studies were synthesized using narrative and statistical methods, as appropriate. Results: Of 1048 citations retrieved, we identified 12 studies examining the effect of using chatbots on 8 outcomes. Weak evidence demonstrated that chatbots were effective in improving depression, distress, stress, and acrophobia. In contrast, according to similar evidence, there was no statistically significant effect of using chatbots on subjective psychological wellbeing. Results were conflicting regarding the effect of chatbots on the severity of anxiety and positive and negative affect. Only two studies assessed the safety of chatbots and concluded that they are safe in mental health, as no adverse events or harms were reported. Conclusions: Chatbots have the potential to improve mental health. However, the evidence in this review was not sufficient to definitely conclude this due to lack of evidence that their effect is clinically important, a lack of studies assessing each outcome, high risk of bias in those studies, and conflicting results for some outcomes. Further studies are required to draw solid conclusions about the effectiveness and safety of chatbots. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42019141219; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019141219 UR - http://www.jmir.org/2020/7/e16021/ UR - http://dx.doi.org/10.2196/16021 UR - http://www.ncbi.nlm.nih.gov/pubmed/32673216 ID - info:doi/10.2196/16021 ER - TY - JOUR AU - Linden, Brooke AU - Tam-Seto, Linna AU - Stuart, Heather PY - 2020/6/17 TI - Adherence of the #Here4U App ? Military Version to Criteria for the Development of Rigorous Mental Health Apps JO - JMIR Form Res SP - e18890 VL - 4 IS - 6 KW - mental health services KW - telemedicine KW - mHealth KW - chatbot KW - e-solutions KW - Canadian Armed Forces KW - military health KW - mobile phone N2 - Background: Over the past several years, the emergence of mobile mental health apps has increased as a potential solution for populations who may face logistical and social barriers to traditional service delivery, including individuals connected to the military. Objective: The goal of the #Here4U App ? Military Version is to provide evidence-informed mental health support to members of Canada?s military community, leveraging artificial intelligence in the form of IBM Canada?s Watson Assistant to carry on unique text-based conversations with users, identify presenting mental health concerns, and refer users to self-help resources or recommend professional health care where appropriate. Methods: As the availability and use of mental health apps has increased, so too has the list of recommendations and guidelines for efficacious development. We describe the development and testing conducted between 2018 and 2020 and assess the quality of the #Here4U App against 16 criteria for rigorous mental health app development, as identified by Bakker and colleagues in 2016. Results: The #Here4U App ? Military Version met the majority of Bakker and colleagues? criteria, with those unmet considered not applicable to this particular product or out of scope for research conducted to date. Notably, a formal evaluation of the efficacy of the app is a major priority moving forward. Conclusions: The #Here4U App ? Military Version is a promising new mental health e-solution for members of the Canadian Armed Forces community, filling many of the gaps left by traditional service delivery. UR - https://formative.jmir.org/2020/6/e18890 UR - http://dx.doi.org/10.2196/18890 UR - http://www.ncbi.nlm.nih.gov/pubmed/32554374 ID - info:doi/10.2196/18890 ER - TY - JOUR AU - Abd-Alrazaq, Alaa AU - Safi, Zeineb AU - Alajlani, Mohannad AU - Warren, Jim AU - Househ, Mowafa AU - Denecke, Kerstin PY - 2020/6/5 TI - Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review JO - J Med Internet Res SP - e18301 VL - 22 IS - 6 KW - chatbots KW - conversational agents KW - health care KW - evaluation KW - metrics N2 - Background: Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field. Objective: This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots. Methods: Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated. Results: Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content). Conclusions: The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies. UR - http://www.jmir.org/2020/6/e18301/ UR - http://dx.doi.org/10.2196/18301 UR - http://www.ncbi.nlm.nih.gov/pubmed/32442157 ID - info:doi/10.2196/18301 ER - TY - JOUR AU - Fu, Weifeng PY - 2020/6/3 TI - Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study JO - JMIR Med Inform SP - e18677 VL - 8 IS - 6 KW - speech recognition KW - isolated words KW - mental health KW - small vocabulary KW - HMM KW - hidden Markov model KW - programming N2 - Background: Speech recognition is a technology that enables machines to understand human language. Objective: In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods: A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results: The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions: The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system. UR - https://medinform.jmir.org/2020/6/e18677 UR - http://dx.doi.org/10.2196/18677 UR - http://www.ncbi.nlm.nih.gov/pubmed/32384054 ID - info:doi/10.2196/18677 ER - TY - JOUR AU - Chen, Jessica AU - Lyell, David AU - Laranjo, Liliana AU - Magrabi, Farah PY - 2020/6/1 TI - Effect of Speech Recognition on Problem Solving and Recall in Consumer Digital Health Tasks: Controlled Laboratory Experiment JO - J Med Internet Res SP - e14827 VL - 22 IS - 6 KW - speech recognition software KW - consumer health informatics KW - ergonomics N2 - Background: Recent advances in natural language processing and artificial intelligence have led to widespread adoption of speech recognition technologies. In consumer health applications, speech recognition is usually applied to support interactions with conversational agents for data collection, decision support, and patient monitoring. However, little is known about the use of speech recognition in consumer health applications and few studies have evaluated the efficacy of conversational agents in the hands of consumers. In other consumer-facing tools, cognitive load has been observed to be an important factor affecting the use of speech recognition technologies in tasks involving problem solving and recall. Users find it more difficult to think and speak at the same time when compared to typing, pointing, and clicking. However, the effects of speech recognition on cognitive load when performing health tasks has not yet been explored. Objective: The aim of this study was to evaluate the use of speech recognition for documentation in consumer digital health tasks involving problem solving and recall. Methods: Fifty university staff and students were recruited to undertake four documentation tasks with a simulated conversational agent in a computer laboratory. The tasks varied in complexity determined by the amount of problem solving and recall required (simple and complex) and the input modality (speech recognition vs keyboard and mouse). Cognitive load, task completion time, error rate, and usability were measured. Results: Compared to using a keyboard and mouse, speech recognition significantly increased the cognitive load for complex tasks (Z=?4.08, P<.001) and simple tasks (Z=?2.24, P=.03). Complex tasks took significantly longer to complete (Z=?2.52, P=.01) and speech recognition was found to be overall less usable than a keyboard and mouse (Z=?3.30, P=.001). However, there was no effect on errors. Conclusions: Use of a keyboard and mouse was preferable to speech recognition for complex tasks involving problem solving and recall. Further studies using a broader variety of consumer digital health tasks of varying complexity are needed to investigate the contexts in which use of speech recognition is most appropriate. The effects of cognitive load on task performance and its significance also need to be investigated. UR - https://www.jmir.org/2020/6/e14827 UR - http://dx.doi.org/10.2196/14827 UR - http://www.ncbi.nlm.nih.gov/pubmed/32442129 ID - info:doi/10.2196/14827 ER - TY - JOUR AU - Bennion, Russell Matthew AU - Hardy, E. Gillian AU - Moore, K. Roger AU - Kellett, Stephen AU - Millings, Abigail PY - 2020/5/27 TI - Usability, Acceptability, and Effectiveness of Web-Based Conversational Agents to Facilitate Problem Solving in Older Adults: Controlled Study JO - J Med Internet Res SP - e16794 VL - 22 IS - 5 KW - transdiagnostic KW - method of levels KW - system usability KW - acceptability KW - effectiveness KW - mental health KW - conversational agents KW - older adults KW - chatbots KW - web-based KW - N2 - Background: The usability and effectiveness of conversational agents (chatbots) that deliver psychological therapies is under-researched. Objective: This study aimed to compare the system usability, acceptability, and effectiveness in older adults of 2 Web-based conversational agents that differ in theoretical orientation and approach. Methods: In a randomized study, 112 older adults were allocated to 1 of the following 2 fully automated interventions: Manage Your Life Online (MYLO; ie, a chatbot that mimics a therapist using a method of levels approach) and ELIZA (a chatbot that mimics a therapist using a humanistic counseling approach). The primary outcome was problem distress and resolution, with secondary outcome measures of system usability and clinical outcome. Results: MYLO participants spent significantly longer interacting with the conversational agent. Posthoc tests indicated that MYLO participants had significantly lower problem distress at follow-up. There were no differences between MYLO and ELIZA in terms of problem resolution. MYLO was rated as significantly more helpful and likely to be used again. System usability of both the conversational agents was associated with helpfulness of the agents and the willingness of the participants to reuse. Adherence was high. A total of 12% (7/59) of the MYLO group did not carry out their conversation with the chatbot. Conclusions: Controlled studies of chatbots need to be conducted in clinical populations across different age groups. The potential integration of chatbots into psychological care in routine services is discussed. UR - http://www.jmir.org/2020/5/e16794/ UR - http://dx.doi.org/10.2196/16794 UR - http://www.ncbi.nlm.nih.gov/pubmed/32384055 ID - info:doi/10.2196/16794 ER - TY - JOUR AU - Zand, Aria AU - Sharma, Arjun AU - Stokes, Zack AU - Reynolds, Courtney AU - Montilla, Alberto AU - Sauk, Jenny AU - Hommes, Daniel PY - 2020/5/26 TI - An Exploration Into the Use of a Chatbot for Patients With Inflammatory Bowel Diseases: Retrospective Cohort Study JO - J Med Internet Res SP - e15589 VL - 22 IS - 5 KW - chatbots KW - inflammatory bowel diseases KW - eHealth KW - artificial intelligence KW - telehealth KW - natural language processing N2 - Background: The emergence of chatbots in health care is fast approaching. Data on the feasibility of chatbots for chronic disease management are scarce. Objective: This study aimed to explore the feasibility of utilizing natural language processing (NLP) for the categorization of electronic dialog data of patients with inflammatory bowel diseases (IBD) for use in the development of a chatbot. Methods: Electronic dialog data collected between 2013 and 2018 from a care management platform (UCLA eIBD) at a tertiary referral center for IBD at the University of California, Los Angeles, were used. Part of the data was manually reviewed, and an algorithm for categorization was created. The algorithm categorized all relevant dialogs into a set number of categories using NLP. In addition, 3 independent physicians evaluated the appropriateness of the categorization. Results: A total of 16,453 lines of dialog were collected and analyzed. We categorized 8324 messages from 424 patients into seven categories. As there was an overlap in these categories, their frequencies were measured independently as symptoms (2033/6193, 32.83%), medications (2397/6193, 38.70%), appointments (1518/6193, 24.51%), laboratory investigations (2106/6193, 34.01%), finance or insurance (447/6193, 7.22%), communications (2161/6193, 34.89%), procedures (617/6193, 9.96%), and miscellaneous (624/6193, 10.08%). Furthermore, in 95.0% (285/300) of cases, there were minor or no differences in categorization between the algorithm and the three independent physicians. Conclusions: With increased adaptation of electronic health technologies, chatbots could have great potential in interacting with patients, collecting data, and increasing efficiency. Our categorization showcases the feasibility of using NLP in large amounts of electronic dialog for the development of a chatbot algorithm. Chatbots could allow for the monitoring of patients beyond consultations and potentially empower and educate patients and improve clinical outcomes. UR - http://www.jmir.org/2020/5/e15589/ UR - http://dx.doi.org/10.2196/15589 UR - http://www.ncbi.nlm.nih.gov/pubmed/32452808 ID - info:doi/10.2196/15589 ER - TY - JOUR AU - Arem, Hannah AU - Scott, Remle AU - Greenberg, Daniel AU - Kaltman, Rebecca AU - Lieberman, Daniel AU - Lewin, Daniel PY - 2020/5/26 TI - Assessing Breast Cancer Survivors? Perceptions of Using Voice-Activated Technology to Address Insomnia: Feasibility Study Featuring Focus Groups and In-Depth Interviews JO - JMIR Cancer SP - e15859 VL - 6 IS - 1 KW - artificial intelligence KW - breast neoplasms KW - survivors KW - insomnia KW - cognitive behavioral therapy KW - mobile phones N2 - Background: Breast cancer survivors (BCSs) are a growing population with a higher prevalence of insomnia than women of the same age without a history of cancer. Cognitive behavioral therapy for insomnia (CBT-I) has been shown to be effective in this population, but it is not widely available to those who need it. Objective: This study aimed to better understand BCSs? experiences with insomnia and to explore the feasibility and acceptability of delivering CBT-I using a virtual assistant (Amazon Alexa). Methods: We first conducted a formative phase with 2 focus groups and 3 in-depth interviews to understand BCSs? perceptions of insomnia as well as their interest in and comfort with using a virtual assistant to learn about CBT-I. We then developed a prototype incorporating participant preferences and CBT-I components and demonstrated it in group and individual settings to BCSs to evaluate acceptability, interest, perceived feasibility, educational potential, and usability of the prototype. We also collected open-ended feedback on the content and used frequencies to describe the quantitative data. Results: We recruited 11 BCSs with insomnia in the formative phase and 14 BCSs in the prototype demonstration. In formative work, anxiety, fear, and hot flashes were identified as causes of insomnia. After prototype demonstration, nearly 79% (11/14) of participants reported an interest in and perceived feasibility of using the virtual assistant to record sleep patterns. Approximately two-thirds of the participants thought lifestyle modification (9/14, 64%) and sleep restriction (9/14, 64%) would be feasible and were interested in this feature of the program (10/14, 71% and 9/14, 64%, respectively). Relaxation exercises were rated as interesting and feasible using the virtual assistant by 71% (10/14) of the participants. Usability was rated as better than average, and all women reported that they would recommend the program to friends and family. Conclusions: This virtual assistant prototype delivering CBT-I components by using a smart speaker was rated as feasible and acceptable, suggesting that this prototype should be fully developed and tested for efficacy in the BCS population. If efficacy is shown in this population, the prototype should also be adapted for other high-risk populations. UR - http://cancer.jmir.org/2020/1/e15859/ UR - http://dx.doi.org/10.2196/15859 UR - http://www.ncbi.nlm.nih.gov/pubmed/32348274 ID - info:doi/10.2196/15859 ER - TY - JOUR AU - Piao, Meihua AU - Ryu, Hyeongju AU - Lee, Hyeongsuk AU - Kim, Jeongeun PY - 2020/5/19 TI - Use of the Healthy Lifestyle Coaching Chatbot App to Promote Stair-Climbing Habits Among Office Workers: Exploratory Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e15085 VL - 8 IS - 5 KW - exercise KW - habits KW - reward KW - health behavior KW - healthy lifestyle N2 - Background: Lack of time for exercise is common among office workers given their busy lives. Because of occupational restrictions and difficulty in taking time off, it is necessary to suggest effective ways for workers to exercise regularly. Sustaining lifestyle habits that increase nonexercise activity in daily life can solve the issue of lack of exercise time. Healthy Lifestyle Coaching Chatbot is a messenger app based on the habit formation model that can be used as a tool to provide a health behavior intervention that emphasizes the importance of sustainability and involvement. Objective: This study aimed to assess the efficacy of the Healthy Lifestyle Coaching Chatbot intervention presented via a messenger app aimed at stair-climbing habit formation for office workers. Methods: From February 1, 2018, to April 30, 2018, a total of 106 people participated in the trial after online recruitment. Participants were randomly assigned to the intervention group (n=57) or the control group (n=49). The intervention group received cues and intrinsic and extrinsic rewards for the entire 12 weeks. However, the control group did not receive intrinsic rewards for the first 4 weeks and only received all rewards as in the intervention group from the fifth to twelfth week. The Self-Report Habit Index (SRHI) of participants was evaluated every week, and the level of physical activity was measured at the beginning and end of the trial. SPSS Statistics version 21 (IBM Corp) was used for statistical analysis. Results: After 4 weeks of intervention without providing the intrinsic rewards in the control group, the change in SRHI scores was 13.54 (SD 14.99) in the intervention group and 6.42 (SD 9.42) in the control group, indicating a significant difference between the groups (P=.04). When all rewards were given to both groups, from the fifth to twelfth week, the change in SRHI scores of the intervention and control groups was comparable at 12.08 (SD 10.87) and 15.88 (SD 13.29), respectively (P=.21). However, the level of physical activity showed a significant difference between the groups after 12 weeks of intervention (P=.045). Conclusions: This study provides evidence that intrinsic rewards are important to enhance the sustainability and effectiveness of an intervention. The Healthy Lifestyle Coaching Chatbot program can be a cost-effective method for healthy habit formation. Trial Registration: Clinical Research Information Service KCT0004009; https://tinyurl.com/w4oo7md UR - https://mhealth.jmir.org/2020/5/e15085 UR - http://dx.doi.org/10.2196/15085 UR - http://www.ncbi.nlm.nih.gov/pubmed/32427114 ID - info:doi/10.2196/15085 ER - TY - JOUR AU - Espinoza, Juan AU - Crown, Kelly AU - Kulkarni, Omkar PY - 2020/4/30 TI - A Guide to Chatbots for COVID-19 Screening at Pediatric Health Care Facilities JO - JMIR Public Health Surveill SP - e18808 VL - 6 IS - 2 KW - chatbots KW - COVID-19: pediatrics KW - digital health KW - screening UR - http://publichealth.jmir.org/2020/2/e18808/ UR - http://dx.doi.org/10.2196/18808 UR - http://www.ncbi.nlm.nih.gov/pubmed/32325425 ID - info:doi/10.2196/18808 ER - TY - JOUR AU - Hauser-Ulrich, Sandra AU - Künzli, Hansjörg AU - Meier-Peterhans, Danielle AU - Kowatsch, Tobias PY - 2020/4/3 TI - A Smartphone-Based Health Care Chatbot to Promote Self-Management of Chronic Pain (SELMA): Pilot Randomized Controlled Trial JO - JMIR Mhealth Uhealth SP - e15806 VL - 8 IS - 4 KW - conversational agent KW - chatbot KW - digital health KW - pain self-management KW - cognitive behavior therapy KW - smartphone KW - psychoeducation KW - text-based KW - health care KW - chronic pain N2 - Background: Ongoing pain is one of the most common diseases and has major physical, psychological, social, and economic impacts. A mobile health intervention utilizing a fully automated text-based health care chatbot (TBHC) may offer an innovative way not only to deliver coping strategies and psychoeducation for pain management but also to build a working alliance between a participant and the TBHC. Objective: The objectives of this study are twofold: (1) to describe the design and implementation to promote the chatbot painSELfMAnagement (SELMA), a 2-month smartphone-based cognitive behavior therapy (CBT) TBHC intervention for pain self-management in patients with ongoing or cyclic pain, and (2) to present findings from a pilot randomized controlled trial, in which effectiveness, influence of intention to change behavior, pain duration, working alliance, acceptance, and adherence were evaluated. Methods: Participants were recruited online and in collaboration with pain experts, and were randomized to interact with SELMA for 8 weeks either every day or every other day concerning CBT-based pain management (n=59), or weekly concerning content not related to pain management (n=43). Pain-related impairment (primary outcome), general well-being, pain intensity, and the bond scale of working alliance were measured at baseline and postintervention. Intention to change behavior and pain duration were measured at baseline only, and acceptance postintervention was assessed via self-reporting instruments. Adherence was assessed via usage data. Results: From May 2018 to August 2018, 311 adults downloaded the SELMA app, 102 of whom consented to participate and met the inclusion criteria. The average age of the women (88/102, 86.4%) and men (14/102, 13.6%) participating was 43.7 (SD 12.7) years. Baseline group comparison did not differ with respect to any demographic or clinical variable. The intervention group reported no significant change in pain-related impairment (P=.68) compared to the control group postintervention. The intention to change behavior was positively related to pain-related impairment (P=.01) and pain intensity (P=.01). Working alliance with the TBHC SELMA was comparable to that obtained in guided internet therapies with human coaches. Participants enjoyed using the app, perceiving it as useful and easy to use. Participants of the intervention group replied with an average answer ratio of 0.71 (SD 0.20) to 200 (SD 58.45) conversations initiated by SELMA. Participants? comments revealed an appreciation of the empathic and responsible interaction with the TBHC SELMA. A main criticism was that there was no option to enter free text for the patients? own comments. Conclusions: SELMA is feasible, as revealed mainly by positive feedback and valuable suggestions for future revisions. For example, the participants? intention to change behavior or a more homogenous sample (eg, with a specific type of chronic pain) should be considered in further tailoring of SELMA. Trial Registration: German Clinical Trials Register DRKS00017147; https://tinyurl.com/vx6n6sx, Swiss National Clinical Trial Portal: SNCTP000002712; https://www.kofam.ch/de/studienportal/suche/70582/studie/46326. UR - http://mhealth.jmir.org/2020/4/e15806/ UR - http://dx.doi.org/10.2196/15806 UR - http://www.ncbi.nlm.nih.gov/pubmed/32242820 ID - info:doi/10.2196/15806 ER - TY - JOUR AU - Hurmuz, M. Marian Z. AU - Jansen-Kosterink, M. Stephanie AU - op den Akker, Harm AU - Hermens, J. Hermie PY - 2020/4/3 TI - User Experience and Potential Health Effects of a Conversational Agent-Based Electronic Health Intervention: Protocol for an Observational Cohort Study JO - JMIR Res Protoc SP - e16641 VL - 9 IS - 4 KW - virtual coaching KW - effectiveness KW - user experience KW - evaluation protocol KW - older adults KW - adults KW - type 2 diabetes mellitus KW - chronic pain KW - healthy lifestyle N2 - Background: While the average human life expectancy has increased remarkably, the length of life with chronic conditions has also increased. To limit the occurrence of chronic conditions and comorbidities, it is important to adopt a healthy lifestyle. Within the European project ?Council of Coaches,? a personalized coaching platform was developed that supports developing and maintaining a healthy lifestyle. Objective: The primary aim of this study is to assess the user experience with and the use and potential health effects of a fully working Council of Coaches system implemented in a real-world setting among the target population, specifically older adults or adults with type 2 diabetes mellitus or chronic pain. Methods: An observational cohort study with a pretest-posttest design will be conducted. The study population will be a dynamic cohort consisting of older adults, aged ?55 years, as well as adults aged ?18 years with type 2 diabetes mellitus or chronic pain. Each participant will interact in a fully automated manner with Council of Coaches for 5 to 9 weeks. The primary outcomes are user experience, use of the program, and potential effects (health-related factors). Secondary outcomes include demographics, applicability of the virtual coaches, and user interaction with the virtual coaches. Results: Recruitment started in December 2019 and is conducted through mass mailing, snowball sampling, and advertisements in newspapers and social media. This study is expected to conclude in August 2020. Conclusions: The results of this study will either confirm or reject the hypothesis that a group of virtual embodied conversational coaches can keep users engaged over several weeks of interaction and contribute to positive health outcomes. Trial Registration: The Netherlands Trial Register: NL7911; https://www.trialregister.nl/trial/7911 International Registered Report Identifier (IRRID): PRR1-10.2196/16641 UR - https://www.researchprotocols.org/2020/4/e16641 UR - http://dx.doi.org/10.2196/16641 UR - http://www.ncbi.nlm.nih.gov/pubmed/32242517 ID - info:doi/10.2196/16641 ER - TY - JOUR AU - Ta, Vivian AU - Griffith, Caroline AU - Boatfield, Carolynn AU - Wang, Xinyu AU - Civitello, Maria AU - Bader, Haley AU - DeCero, Esther AU - Loggarakis, Alexia PY - 2020/3/6 TI - User Experiences of Social Support From Companion Chatbots in Everyday Contexts: Thematic Analysis JO - J Med Internet Res SP - e16235 VL - 22 IS - 3 KW - artificial intelligence KW - social support KW - artificial agents KW - chatbots KW - interpersonal relations N2 - Background: Previous research suggests that artificial agents may be a promising source of social support for humans. However, the bulk of this research has been conducted in the context of social support interventions that specifically address stressful situations or health improvements. Little research has examined social support received from artificial agents in everyday contexts. Objective: Considering that social support manifests in not only crises but also everyday situations and that everyday social support forms the basis of support received during more stressful events, we aimed to investigate the types of everyday social support that can be received from artificial agents. Methods: In Study 1, we examined publicly available user reviews (N=1854) of Replika, a popular companion chatbot. In Study 2, a sample (n=66) of Replika users provided detailed open-ended responses regarding their experiences of using Replika. We conducted thematic analysis on both datasets to gain insight into the kind of everyday social support that users receive through interactions with Replika. Results: Replika provides some level of companionship that can help curtail loneliness, provide a ?safe space? in which users can discuss any topic without the fear of judgment or retaliation, increase positive affect through uplifting and nurturing messages, and provide helpful information/advice when normal sources of informational support are not available. Conclusions: Artificial agents may be a promising source of everyday social support, particularly companionship, emotional, informational, and appraisal support, but not as tangible support. Future studies are needed to determine who might benefit from these types of everyday social support the most and why. These results could potentially be used to help address global health issues or other crises early on in everyday situations before they potentially manifest into larger issues. UR - http://www.jmir.org/2020/2/e16235/ UR - http://dx.doi.org/10.2196/16235 UR - http://www.ncbi.nlm.nih.gov/pubmed/32141837 ID - info:doi/10.2196/16235 ER - TY - JOUR AU - García-Carbajal, Santiago AU - Pipa-Muniz, María AU - Múgica, Luis Jose PY - 2020/2/27 TI - Using String Metrics to Improve the Design of Virtual Conversational Characters: Behavior Simulator Development Study JO - JMIR Serious Games SP - e15349 VL - 8 IS - 1 KW - spoken interaction KW - string metrics KW - virtual conversational characters KW - serious games KW - e-learning N2 - Background: An emergency waiting room is a place where conflicts often arise. Nervous relatives in a hostile, unknown environment force security and medical staff to be ready to deal with some awkward situations. Additionally, it has been said that the medical interview is the first diagnostic and therapeutic tool, involving both intellectual and emotional skills on the part of the doctor. At the same time, it seems that there is something mysterious about interviewing that cannot be formalized or taught. In this context, virtual conversational characters (VCCs) are progressively present in most e-learning environments. Objective: In this study, we propose and develop a modular architecture for a VCC-based behavior simulator to be used as a tool for conflict avoidance training. Our behavior simulators are now being used in hospital environments, where training exercises must be easily designed and tested. Methods: We define training exercises as labeled, directed graphs that help an instructor in the design of complex training situations. In order to increase the perception of talking to a real person, the simulator must deal with a huge number of sentences that a VCC must understand and react to. These sentences are grouped into sets identified with a common label. Labels are then used to trigger changes in the active node of the graph that encodes the current state of the training exercise. As a consequence, we need to be able to map every sentence said by the human user into the set it belongs to, in a fast and robust way. In this work, we discuss two different existing string metrics, and compare them to one that we use to assess a designed exercise. Results: Based on the similarities found between different sets, the proposed metric provided valuable information about ill-defined exercises. We also described the environment in which our programs are being used and illustrated it with an example. Conclusions: Initially designed as a tool for training emergency room staff, our software could be of use in many other areas within the same environment. We are currently exploring the possibility of using it in speech therapy situations. UR - http://games.jmir.org/2020/1/e15349/ UR - http://dx.doi.org/10.2196/15349 UR - http://www.ncbi.nlm.nih.gov/pubmed/32130121 ID - info:doi/10.2196/15349 ER - TY - JOUR AU - Gabrielli, Silvia AU - Rizzi, Silvia AU - Carbone, Sara AU - Donisi, Valeria PY - 2020/2/14 TI - A Chatbot-Based Coaching Intervention for Adolescents to Promote Life Skills: Pilot Study JO - JMIR Hum Factors SP - e16762 VL - 7 IS - 1 KW - life skills KW - chatbots KW - conversational agents KW - mental health KW - participatory design KW - adolescence KW - bullying KW - cyberbullying KW - well-being intervention N2 - Background: Adolescence is a challenging period, where youth face rapid changes as well as increasing socioemotional demands and threats, such as bullying and cyberbullying. Adolescent mental health and well-being can be best supported by providing effective coaching on life skills, such as coping strategies and protective factors. Interventions that take advantage of online coaching by means of chatbots, deployed on Web or mobile technology, may be a novel and more appealing way to support positive mental health for adolescents. Objective: In this pilot study, we co-designed and conducted a formative evaluation of an online, life skills coaching, chatbot intervention, inspired by the positive technology approach, to promote mental well-being in adolescence. Methods: We co-designed the first life skills coaching session of the CRI (for girls) and CRIS (for boys) chatbot with 20 secondary school students in a participatory design workshop. We then conducted a formative evaluation of the entire intervention?eight sessions?with a convenience sample of 21 adolescents of both genders (mean age 14.52 years). Participants engaged with the chatbot sessions over 4 weeks and filled in an anonymous user experience questionnaire at the end of each session; responses were based on a 5-point Likert scale. Results: A majority of the adolescents found the intervention useful (16/21, 76%), easy to use (19/21, 90%), and innovative (17/21, 81%). Most of the participants (15/21, 71%) liked, in particular, the video cartoons provided by the chatbot in the coaching sessions. They also thought that a session should last only 5-10 minutes (14/21, 66%) and said they would recommend the intervention to a friend (20/21, 95%). Conclusions: We have presented a novel and scalable self-help intervention to deliver life skills coaching to adolescents online that is appealing to this population. This intervention can support the promotion of coping skills and mental well-being among youth. UR - http://humanfactors.jmir.org/2020/1/e16762/ UR - http://dx.doi.org/10.2196/16762 UR - http://www.ncbi.nlm.nih.gov/pubmed/32130128 ID - info:doi/10.2196/16762 ER - TY - JOUR AU - Kocaballi, Baki Ahmet AU - Quiroz, C. Juan AU - Rezazadegan, Dana AU - Berkovsky, Shlomo AU - Magrabi, Farah AU - Coiera, Enrico AU - Laranjo, Liliana PY - 2020/2/10 TI - Responses of Conversational Agents to Health and Lifestyle Prompts: Investigation of Appropriateness and Presentation Structures JO - J Med Internet Res SP - e15823 VL - 22 IS - 2 KW - conversational agents KW - chatbots KW - patient safety KW - health literacy KW - public health KW - design principles KW - evaluation N2 - Background: Conversational agents (CAs) are systems that mimic human conversations using text or spoken language. Their widely used examples include voice-activated systems such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana. The use of CAs in health care has been on the rise, but concerns about their potential safety risks often remain understudied. Objective: This study aimed to analyze how commonly available, general-purpose CAs on smartphones and smart speakers respond to health and lifestyle prompts (questions and open-ended statements) by examining their responses in terms of content and structure alike. Methods: We followed a piloted script to present health- and lifestyle-related prompts to 8 CAs. The CAs? responses were assessed for their appropriateness on the basis of the prompt type: responses to safety-critical prompts were deemed appropriate if they included a referral to a health professional or service, whereas responses to lifestyle prompts were deemed appropriate if they provided relevant information to address the problem prompted. The response structure was also examined according to information sources (Web search?based or precoded), response content style (informative and/or directive), confirmation of prompt recognition, and empathy. Results: The 8 studied CAs provided in total 240 responses to 30 prompts. They collectively responded appropriately to 41% (46/112) of the safety-critical and 39% (37/96) of the lifestyle prompts. The ratio of appropriate responses deteriorated when safety-critical prompts were rephrased or when the agent used a voice-only interface. The appropriate responses included mostly directive content and empathy statements for the safety-critical prompts and a mix of informative and directive content for the lifestyle prompts. Conclusions: Our results suggest that the commonly available, general-purpose CAs on smartphones and smart speakers with unconstrained natural language interfaces are limited in their ability to advise on both the safety-critical health prompts and lifestyle prompts. Our study also identified some response structures the CAs employed to present their appropriate responses. Further investigation is needed to establish guidelines for designing suitable response structures for different prompt types. UR - https://www.jmir.org/2020/2/e15823 UR - http://dx.doi.org/10.2196/15823 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/15823 ER - TY - JOUR AU - Kramer, L. Lean AU - ter Stal, Silke AU - Mulder, C. Bob AU - de Vet, Emely AU - van Velsen, Lex PY - 2020/2/5 TI - Developing Embodied Conversational Agents for Coaching People in a Healthy Lifestyle: Scoping Review JO - J Med Internet Res SP - e14058 VL - 22 IS - 2 KW - embodied conversational agent KW - virtual agent KW - lifestyle KW - health behavior KW - eHealth KW - chatbots N2 - Background: Embodied conversational agents (ECAs) are animated computer characters that simulate face-to-face counseling. Owing to their capacity to establish and maintain an empathic relationship, they are deemed to be a promising tool for starting and maintaining a healthy lifestyle. Objective: This review aimed to identify the current practices in designing and evaluating ECAs for coaching people in a healthy lifestyle and provide an overview of their efficacy (on behavioral, knowledge, and motivational parameters) and use (on usability, usage, and user satisfaction parameters). Methods: We used the Arksey and O?Malley framework to conduct a scoping review. PsycINFO, Medical Literature Analysis and Retrieval System Online, and Scopus were searched with a combination of terms related to ECA and lifestyle. Initially, 1789 unique studies were identified; 20 studies were included. Results: Most often, ECAs targeted physical activity (n=16) and had the appearance of a middle-aged African American woman (n=13). Multiple behavior change techniques (median=3) and theories or principles (median=3) were applied, but their interpretation and application were usually not reported. ECAs seemed to be designed for the end user rather than with the end user. Stakeholders were usually not involved. A total of 7 out of 15 studies reported better efficacy outcomes for the intervention group, and 5 out of 8 studies reported better use-related outcomes, as compared with the control group. Conclusions: ECAs are a promising tool for persuasive communication in the health domain. This review provided valuable insights into the current developmental processes, and it recommends the use of human-centered, stakeholder-inclusive design approaches, along with reporting on the design activities in a systematic and comprehensive manner. The gaps in knowledge were identified on the working mechanisms of intervention components and the right timing and frequency of coaching. UR - https://www.jmir.org/2020/2/e14058 UR - http://dx.doi.org/10.2196/14058 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/14058 ER - TY - JOUR AU - Holdener, Marianne AU - Gut, Alain AU - Angerer, Alfred PY - 2020/1/3 TI - Applicability of the User Engagement Scale to Mobile Health: A Survey-Based Quantitative Study JO - JMIR Mhealth Uhealth SP - e13244 VL - 8 IS - 1 KW - mobile health KW - mhealth KW - mobile apps KW - user engagement KW - measurement KW - user engagement scale KW - chatbot N2 - Background: There has recently been exponential growth in the development and use of health apps on mobile phones. As with most mobile apps, however, the majority of users abandon them quickly and after minimal use. One of the most critical factors for the success of a health app is how to support users? commitment to their health. Despite increased interest from researchers in mobile health, few studies have examined the measurement of user engagement with health apps. Objective: User engagement is a multidimensional, complex phenomenon. The aim of this study was to understand the concept of user engagement and, in particular, to demonstrate the applicability of a user engagement scale (UES) to mobile health apps. Methods: To determine the measurability of user engagement in a mobile health context, a UES was employed, which is a psychometric tool to measure user engagement with a digital system. This was adapted to Ada, developed by Ada Health, an artificial intelligence?powered personalized health guide that helps people understand their health. A principal component analysis (PCA) with varimax rotation was conducted on 30 items. In addition, sum scores as means of each subscale were calculated. Results: Survey data from 73 Ada users were analyzed. PCA was determined to be suitable, as verified by the sampling adequacy of Kaiser-Meyer-Olkin=0.858, a significant Bartlett test of sphericity (?2300=1127.1; P<.001), and communalities mostly within the 0.7 range. Although 5 items had to be removed because of low factor loadings, the results of the remaining 25 items revealed 4 attributes: perceived usability, aesthetic appeal, reward, and focused attention. Ada users showed the highest engagement level with perceived usability, with a value of 294, followed by aesthetic appeal, reward, and focused attention. Conclusions: Although the UES was deployed in German and adapted to another digital domain, PCA yielded consistent subscales and a 4-factor structure. This indicates that user engagement with health apps can be assessed with the German version of the UES. These results can benefit related mobile health app engagement research and may be of importance to marketers and app developers. UR - https://mhealth.jmir.org/2020/1/e13244 UR - http://dx.doi.org/10.2196/13244 UR - http://www.ncbi.nlm.nih.gov/pubmed/31899454 ID - info:doi/10.2196/13244 ER - TY - JOUR AU - Martin-Hammond, Aqueasha AU - Vemireddy, Sravani AU - Rao, Kartik PY - 2019/12/11 TI - Exploring Older Adults? Beliefs About the Use of Intelligent Assistants for Consumer Health Information Management: A Participatory Design Study JO - JMIR Aging SP - e15381 VL - 2 IS - 2 KW - intelligent assistants KW - artificial intelligence KW - chatbots KW - conversational agents KW - digital health KW - elderly KW - aging in place KW - participatory design KW - co-design KW - health information seeking N2 - Background: Intelligent assistants (IAs), also known as intelligent agents, use artificial intelligence to help users achieve a goal or complete a task. IAs represent a potential solution for providing older adults with individualized assistance at home, for example, to reduce social isolation, serve as memory aids, or help with disease management. However, to design IAs for health that are beneficial and accepted by older adults, it is important to understand their beliefs about IAs, how they would like to interact with IAs for consumer health, and how they desire to integrate IAs into their homes. Objective: We explore older adults? mental models and beliefs about IAs, the tasks they want IAs to support, and how they would like to interact with IAs for consumer health. For the purpose of this study, we focus on IAs in the context of consumer health information management and search. Methods: We present findings from an exploratory, qualitative study that investigated older adults? perspectives of IAs that aid with consumer health information search and management tasks. Eighteen older adults participated in a multiphase, participatory design workshop in which we engaged them in discussion, brainstorming, and design activities that helped us identify their current challenges managing and finding health information at home. We also explored their beliefs and ideas for an IA to assist them with consumer health tasks. We used participatory design activities to identify areas in which they felt IAs might be useful, but also to uncover the reasoning behind the ideas they presented. Discussions were audio-recorded and later transcribed. We compiled design artifacts collected during the study to supplement researcher transcripts and notes. Thematic analysis was used to analyze data. Results: We found that participants saw IAs as potentially useful for providing recommendations, facilitating collaboration between themselves and other caregivers, and for alerts of serious illness. However, they also desired familiar and natural interactions with IAs (eg, using voice) that could, if need be, provide fluid and unconstrained interactions, reason about their symptoms, and provide information or advice. Other participants discussed the need for flexible IAs that could be used by those with low technical resources or skills. Conclusions: From our findings, we present a discussion of three key components of participants? mental models, including the people, behaviors, and interactions they described that were important for IAs for consumer health information management and seeking. We then discuss the role of access, transparency, caregivers, and autonomy in design for addressing participants? concerns about privacy and trust as well as its role in assisting others that may interact with an IA on the older adults? behalf. International Registered Report Identifier (IRRID): RR2-10.1145/3240925.3240972 UR - http://aging.jmir.org/2019/2/e15381/ UR - http://dx.doi.org/10.2196/15381 UR - http://www.ncbi.nlm.nih.gov/pubmed/31825322 ID - info:doi/10.2196/15381 ER - TY - JOUR AU - Bibault, Jean-Emmanuel AU - Chaix, Benjamin AU - Guillemassé, Arthur AU - Cousin, Sophie AU - Escande, Alexandre AU - Perrin, Morgane AU - Pienkowski, Arthur AU - Delamon, Guillaume AU - Nectoux, Pierre AU - Brouard, Benoît PY - 2019/11/27 TI - A Chatbot Versus Physicians to Provide Information for Patients With Breast Cancer: Blind, Randomized Controlled Noninferiority Trial JO - J Med Internet Res SP - e15787 VL - 21 IS - 11 KW - chatbot KW - clinical trial KW - cancer N2 - Background: The data regarding the use of conversational agents in oncology are scarce. Objective: The aim of this study was to verify whether an artificial conversational agent was able to provide answers to patients with breast cancer with a level of satisfaction similar to the answers given by a group of physicians. Methods: This study is a blind, noninferiority randomized controlled trial that compared the information given by the chatbot, Vik, with that given by a multidisciplinary group of physicians to patients with breast cancer. Patients were women with breast cancer in treatment or in remission. The European Organisation for Research and Treatment of Cancer Quality of Life Group information questionnaire (EORTC QLQ-INFO25) was adapted and used to compare the quality of the information provided to patients by the physician or the chatbot. The primary outcome was to show that the answers given by the Vik chatbot to common questions asked by patients with breast cancer about their therapy management are at least as satisfying as answers given by a multidisciplinary medical committee by comparing the success rate in each group (defined by a score above 3). The secondary objective was to compare the average scores obtained by the chatbot and physicians for each INFO25 item. Results: A total of 142 patients were included and randomized into two groups of 71. They were all female with a mean age of 42 years (SD 19). The success rates (as defined by a score >3) was 69% (49/71) in the chatbot group versus 64% (46/71) in the physicians group. The binomial test showed the noninferiority (P<.001) of the chatbot?s answers. Conclusions: This is the first study that assessed an artificial conversational agent used to inform patients with cancer. The EORTC INFO25 scores from the chatbot were found to be noninferior to the scores of the physicians. Artificial conversational agents may save patients with minor health concerns from a visit to the doctor. This could allow clinicians to spend more time to treat patients who need a consultation the most. Trial Registration: Clinicaltrials.gov NCT03556813, https://tinyurl.com/rgtlehq UR - http://www.jmir.org/2019/11/e15787/ UR - http://dx.doi.org/10.2196/15787 UR - http://www.ncbi.nlm.nih.gov/pubmed/31774408 ID - info:doi/10.2196/15787 ER - TY - JOUR AU - Brar Prayaga, Rena AU - Agrawal, Ridhika AU - Nguyen, Benjamin AU - Jeong, W. Erwin AU - Noble, K. Harmony AU - Paster, Andrew AU - Prayaga, S. Ram PY - 2019/11/18 TI - Impact of Social Determinants of Health and Demographics on Refill Requests by Medicare Patients Using a Conversational Artificial Intelligence Text Messaging Solution: Cross-Sectional Study JO - JMIR Mhealth Uhealth SP - e15771 VL - 7 IS - 11 KW - text messaging KW - SMS KW - refill adherence KW - medication adherence KW - Medicare patients KW - conversational AI KW - social determinants of health KW - predictive modeling KW - machine learning KW - health disparities N2 - Background: Nonadherence among patients with chronic disease continues to be a significant concern, and the use of text message refill reminders has been effective in improving adherence. However, questions remain about how differences in patient characteristics and demographics might influence the likelihood of refill using this channel. Objective: The aim of this study was to evaluate the efficacy of an SMS-based refill reminder solution using conversational artificial intelligence (AI; an automated system that mimics human conversations) with a large Medicare patient population and to explore the association and impact of patient demographics (age, gender, race/ethnicity, language) and social determinants of health on successful engagement with the solution to improve refill adherence. Methods: The study targeted 99,217 patients with chronic disease, median age of 71 years, for medication refill using the mPulse Mobile interactive SMS text messaging solution from December 2016 to February 2019. All patients were partially adherent or nonadherent Medicare Part D members of Kaiser Permanente, Southern California, a large integrated health plan. Patients received SMS reminders in English or Spanish and used simple numeric or text responses to validate their identity, view their medication, and complete a refill request. The refill requests were processed by Kaiser Permanente pharmacists and support staff, and refills were picked up at the pharmacy or mailed to patients. Descriptive statistics and predictive analytics were used to examine the patient population and their refill behavior. Qualitative text analysis was used to evaluate quality of conversational AI. Results: Over the course of the study, 273,356 refill reminders requests were sent to 99,217 patients, resulting in 47,552 refill requests (17.40%). This was consistent with earlier pilot study findings. Of those who requested a refill, 54.81% (26,062/47,552) did so within 2 hours of the reminder. There was a strong inverse relationship (r10=?0.93) between social determinants of health and refill requests. Spanish speakers (5149/48,156, 10.69%) had significantly lower refill request rates compared with English speakers (42,389/225,060, 18.83%; X21 [n=273,216]=1829.2; P<.001). There were also significantly different rates of refill requests by age band (X26 [n=268,793]=1460.3; P<.001), with younger patients requesting refills at a higher rate. Finally, the vast majority (284,598/307,484, 92.23%) of patient responses were handled using conversational AI. Conclusions: Multiple factors impacted refill request rates, including a strong association between social determinants of health and refill rates. The findings suggest that higher refill requests are linked to language, race/ethnicity, age, and social determinants of health, and that English speakers, whites, those younger than 75 years, and those with lower social determinants of health barriers are significantly more likely to request a refill via SMS. A neural network?based predictive model with an accuracy level of 78% was used to identify patients who might benefit from additional outreach to narrow identified gaps based on demographic and socioeconomic factors. UR - http://mhealth.jmir.org/2019/11/e15771/ UR - http://dx.doi.org/10.2196/15771 UR - http://www.ncbi.nlm.nih.gov/pubmed/31738170 ID - info:doi/10.2196/15771 ER - TY - JOUR AU - Xing, Zhaopeng AU - Yu, Fei AU - Du, Jian AU - Walker, S. Jennifer AU - Paulson, B. Claire AU - Mani, S. Nandita AU - Song, Lixin PY - 2019/11/18 TI - Conversational Interfaces for Health: Bibliometric Analysis of Grants, Publications, and Patents JO - J Med Internet Res SP - e14672 VL - 21 IS - 11 KW - conversational interfaces KW - conversational agents KW - chatbots KW - artifical intelligence KW - healthcare KW - bibliometrics KW - social network KW - grants KW - publications KW - patents N2 - Background: Conversational interfaces (CIs) in different modalities have been developed for health purposes, such as health behavioral intervention, patient self-management, and clinical decision support. Despite growing research evidence supporting CIs? potential, CI-related research is still in its infancy. There is a lack of systematic investigation that goes beyond publication review and presents the state of the art from perspectives of funding agencies, academia, and industry by incorporating CI-related public funding and patent activities. Objective: This study aimed to use data systematically extracted from multiple sources (ie, grant, publication, and patent databases) to investigate the development, research, and fund application of health-related CIs and associated stakeholders (ie, countries, organizations, and collaborators). Methods: A multifaceted search query was executed to retrieve records from 9 databases. Bibliometric analysis, social network analysis, and term co-occurrence analysis were conducted on the screened records. Results: This review included 42 funded projects, 428 research publications, and 162 patents. The total dollar amount of grants awarded was US $30,297,932, of which US $13,513,473 was awarded by US funding agencies and US $16,784,459 was funded by the Europe Commission. The top 3 funding agencies in the United States were the National Science Foundation, National Institutes of Health, and Agency for Healthcare Research and Quality. Boston Medical Center was awarded the largest combined grant size (US $2,246,437) for 4 projects. The authors of the publications were from 58 countries and 566 organizations; the top 3 most productive organizations were Northeastern University (United States), Universiti Teknologi MARA (Malaysia), and the French National Center for Scientific Research (CNRS; France). US researchers produced 114 publications. Although 82.0% (464/566) of the organizations engaged in interorganizational collaboration, 2 organizational research-collaboration clusters were observed with Northeastern University and CNRS as the central nodes. About 112 organizations from the United States and China filed 87.7% patents. IBM filed most patents (N=17). Only 5 patents were co-owned by different organizations, and there was no across-country collaboration on patenting activity. The terms patient, child, elderly, and robot were frequently discussed in the 3 record types. The terms related to mental and chronic issues were discussed mainly in grants and publications. The terms regarding multimodal interactions were widely mentioned as users? communication modes with CIs in the identified records. Conclusions: Our findings provided an overview of the countries, organizations, and topic terms in funded projects, as well as the authorship, collaboration, content, and related information of research publications and patents. There is a lack of broad cross-sector partnerships among grant agencies, academia, and industry, particularly in the United States. Our results suggest a need to improve collaboration among public and private sectors and health care organizations in research and patent activities. UR - http://www.jmir.org/2019/11/e14672/ UR - http://dx.doi.org/10.2196/14672 UR - http://www.ncbi.nlm.nih.gov/pubmed/31738171 ID - info:doi/10.2196/14672 ER - TY - JOUR AU - Kocaballi, Baki Ahmet AU - Berkovsky, Shlomo AU - Quiroz, C. Juan AU - Laranjo, Liliana AU - Tong, Ly Huong AU - Rezazadegan, Dana AU - Briatore, Agustina AU - Coiera, Enrico PY - 2019/11/7 TI - The Personalization of Conversational Agents in Health Care: Systematic Review JO - J Med Internet Res SP - e15360 VL - 21 IS - 11 KW - conversational interfaces KW - conversational agents KW - dialogue systems KW - personalization KW - customization KW - adaptive systems KW - health care N2 - Background: The personalization of conversational agents with natural language user interfaces is seeing increasing use in health care applications, shaping the content, structure, or purpose of the dialogue between humans and conversational agents. Objective: The goal of this systematic review was to understand the ways in which personalization has been used with conversational agents in health care and characterize the methods of its implementation. Methods: We searched on PubMed, Embase, CINAHL, PsycInfo, and ACM Digital Library using a predefined search strategy. The studies were included if they: (1) were primary research studies that focused on consumers, caregivers, or health care professionals; (2) involved a conversational agent with an unconstrained natural language interface; (3) tested the system with human subjects; and (4) implemented personalization features. Results: The search found 1958 publications. After abstract and full-text screening, 13 studies were included in the review. Common examples of personalized content included feedback, daily health reports, alerts, warnings, and recommendations. The personalization features were implemented without a theoretical framework of customization and with limited evaluation of its impact. While conversational agents with personalization features were reported to improve user satisfaction, user engagement and dialogue quality, the role of personalization in improving health outcomes was not assessed directly. Conclusions: Most of the studies in our review implemented the personalization features without theoretical or evidence-based support for them and did not leverage the recent developments in other domains of personalization. Future research could incorporate personalization as a distinct design factor with a more careful consideration of its impact on health outcomes and its implications on patient safety, privacy, and decision-making. UR - https://www.jmir.org/2019/11/e15360 UR - http://dx.doi.org/10.2196/15360 UR - http://www.ncbi.nlm.nih.gov/pubmed/31697237 ID - info:doi/10.2196/15360 ER - TY - JOUR AU - Greer, Stephanie AU - Ramo, Danielle AU - Chang, Yin-Juei AU - Fu, Michael AU - Moskowitz, Judith AU - Haritatos, Jana PY - 2019/10/31 TI - Use of the Chatbot ?Vivibot? to Deliver Positive Psychology Skills and Promote Well-Being Among Young People After Cancer Treatment: Randomized Controlled Feasibility Trial JO - JMIR Mhealth Uhealth SP - e15018 VL - 7 IS - 10 KW - chatbot KW - positive psychology KW - young adult KW - cancer N2 - Background: Positive psychology interventions show promise for reducing psychosocial distress associated with health adversity and have the potential to be widely disseminated to young adults through technology. Objective: This pilot randomized controlled trial examined the feasibility of delivering positive psychology skills via the Vivibot chatbot and its effects on key psychosocial well-being outcomes in young adults treated for cancer. Methods: Young adults (age 18-29 years) were recruited within 5 years of completing active cancer treatment by using the Vivibot chatbot on Facebook messenger. Participants were randomized to either immediate access to Vivibot content (experimental group) or access to only daily emotion ratings and access to full chatbot content after 4 weeks (control). Created using a human-centered design process with young adults treated for cancer, Vivibot content includes 4 weeks of positive psychology skills, daily emotion ratings, video, and other material produced by survivors, and periodic feedback check-ins. All participants were assessed for psychosocial well-being via online surveys at baseline and weeks 2, 4, and 8. Analyses examined chatbot engagement and open-ended feedback on likability and perceived helpfulness and compared experimental and control groups with regard to anxiety and depression symptoms and positive and negative emotion changes between baseline and 4 weeks. To verify the main effects, follow-up analyses compared changes in the main outcomes between 4 and 8 weeks in the control group once participants had access to all chatbot content. Results: Data from 45 young adults (36 women; mean age: 25 [SD 2.9]; experimental group: n=25; control group: n=20) were analyzed. Participants in the experimental group spent an average of 74 minutes across an average of 12 active sessions chatting with Vivibot and rated their experience as helpful (mean 2.0/3, SD 0.72) and would recommend it to a friend (mean 6.9/10; SD 2.6). Open-ended feedback noted its nonjudgmental nature as a particular benefit of the chatbot. After 4 weeks, participants in the experimental group reported an average reduction in anxiety of 2.58 standardized t-score units, while the control group reported an increase in anxiety of 0.7 units. A mixed-effects models revealed a trend-level (P=.09) interaction between group and time, with an effect size of 0.41. Those in the experimental group also experienced greater reductions in anxiety when they engaged in more sessions (z=?1.9, P=.06). There were no significant (or trend level) effects by group on changes in depression, positive emotion, or negative emotion. Conclusions: The chatbot format provides a useful and acceptable way of delivering positive psychology skills to young adults who have undergone cancer treatment and supports anxiety reduction. Further analysis with a larger sample size is required to confirm this pattern. UR - http://mhealth.jmir.org/2019/10/e15018/ UR - http://dx.doi.org/10.2196/15018 UR - http://www.ncbi.nlm.nih.gov/pubmed/31674920 ID - info:doi/10.2196/15018 ER - TY - JOUR AU - Jungmann, Maria Stefanie AU - Klan, Timo AU - Kuhn, Sebastian AU - Jungmann, Florian PY - 2019/10/29 TI - Accuracy of a Chatbot (Ada) in the Diagnosis of Mental Disorders: Comparative Case Study With Lay and Expert Users JO - JMIR Form Res SP - e13863 VL - 3 IS - 4 KW - artificial intelligence KW - eHealth KW - mental disorders KW - mHealth KW - screening KW - (mobile) app KW - diagnostic N2 - Background: Health apps for the screening and diagnosis of mental disorders have emerged in recent years on various levels (eg, patients, practitioners, and public health system). However, the diagnostic quality of these apps has not been (sufficiently) tested so far. Objective: The objective of this pilot study was to investigate the diagnostic quality of a health app for a broad spectrum of mental disorders and its dependency on expert knowledge. Methods: Two psychotherapists, two psychology students, and two laypersons each read 20 case vignettes with a broad spectrum of mental disorders. They used a health app (Ada?Your Health Guide) to get a diagnosis by entering the symptoms. Interrater reliabilities were computed between the diagnoses of the case vignettes and the results of the app for each user group. Results: Overall, there was a moderate diagnostic agreement (kappa=0.64) between the results of the app and the case vignettes for mental disorders in adulthood and a low diagnostic agreement (kappa=0.40) for mental disorders in childhood and adolescence. When psychotherapists applied the app, there was a good diagnostic agreement (kappa=0.78) regarding mental disorders in adulthood. The diagnostic agreement was moderate (kappa=0.55/0.60) for students and laypersons. For mental disorders in childhood and adolescence, a moderate diagnostic quality was found when psychotherapists (kappa=0.53) and students (kappa=0.41) used the app, whereas the quality was low for laypersons (kappa=0.29). On average, the app required 34 questions to be answered and 7 min to complete. Conclusions: The health app investigated here can represent an efficient diagnostic screening or help function for mental disorders in adulthood and has the potential to support especially diagnosticians in their work in various ways. The results of this pilot study provide a first indication that the diagnostic accuracy is user dependent and improvements in the app are needed especially for mental disorders in childhood and adolescence. UR - http://formative.jmir.org/2019/4/e13863/ UR - http://dx.doi.org/10.2196/13863 UR - http://www.ncbi.nlm.nih.gov/pubmed/31663858 ID - info:doi/10.2196/13863 ER - TY - JOUR AU - Powell, John PY - 2019/10/28 TI - Trust Me, I?m a Chatbot: How Artificial Intelligence in Health Care Fails the Turing Test JO - J Med Internet Res SP - e16222 VL - 21 IS - 10 KW - artificial intelligence KW - machine learning KW - medical informatics KW - digital health KW - ehealth KW - chatbots KW - conversational agents UR - http://www.jmir.org/2019/10/e16222/ UR - http://dx.doi.org/10.2196/16222 UR - http://www.ncbi.nlm.nih.gov/pubmed/31661083 ID - info:doi/10.2196/16222 ER - TY - JOUR AU - Gaffney, Hannah AU - Mansell, Warren AU - Tai, Sara PY - 2019/10/18 TI - Conversational Agents in the Treatment of Mental Health Problems: Mixed-Method Systematic Review JO - JMIR Ment Health SP - e14166 VL - 6 IS - 10 KW - artificial intelligence KW - mental health KW - stress, pychological KW - psychiatry KW - therapy, computer-assisted KW - conversational agent KW - chatbot KW - digital health N2 - Background: The use of conversational agent interventions (including chatbots and robots) in mental health is growing at a fast pace. Recent existing reviews have focused exclusively on a subset of embodied conversational agent interventions despite other modalities aiming to achieve the common goal of improved mental health. Objective: This study aimed to review the use of conversational agent interventions in the treatment of mental health problems. Methods: We performed a systematic search using relevant databases (MEDLINE, EMBASE, PsycINFO, Web of Science, and Cochrane library). Studies that reported on an autonomous conversational agent that simulated conversation and reported on a mental health outcome were included. Results: A total of 13 studies were included in the review. Among them, 4 full-scale randomized controlled trials (RCTs) were included. The rest were feasibility, pilot RCTs and quasi-experimental studies. Interventions were diverse in design and targeted a range of mental health problems using a wide variety of therapeutic orientations. All included studies reported reductions in psychological distress postintervention. Furthermore, 5 controlled studies demonstrated significant reductions in psychological distress compared with inactive control groups. In addition, 3 controlled studies comparing interventions with active control groups failed to demonstrate superior effects. Broader utility in promoting well-being in nonclinical populations was unclear. Conclusions: The efficacy and acceptability of conversational agent interventions for mental health problems are promising. However, a more robust experimental design is required to demonstrate efficacy and efficiency. A focus on streamlining interventions, demonstrating equivalence to other treatment modalities, and elucidating mechanisms of action has the potential to increase acceptance by users and clinicians and maximize reach. UR - https://mental.jmir.org/2019/10/e14166 UR - http://dx.doi.org/10.2196/14166 UR - http://www.ncbi.nlm.nih.gov/pubmed/31628789 ID - info:doi/10.2196/14166 ER - TY - JOUR AU - Bott, Nicholas AU - Wexler, Sharon AU - Drury, Lin AU - Pollak, Chava AU - Wang, Victor AU - Scher, Kathleen AU - Narducci, Sharon PY - 2019/10/17 TI - A Protocol-Driven, Bedside Digital Conversational Agent to Support Nurse Teams and Mitigate Risks of Hospitalization in Older Adults: Case Control Pre-Post Study JO - J Med Internet Res SP - e13440 VL - 21 IS - 10 KW - digital health KW - older adults KW - loneliness KW - delirium KW - falls KW - embodied conversational agent KW - chatbot KW - relational agent KW - information and communication technology N2 - Background: Hospitalized older adults often experience isolation and disorientation while receiving care, placing them at risk for many inpatient complications, including loneliness, depression, delirium, and falls. Embodied conversational agents (ECAs) are technological entities that can interact with people through spoken conversation. Some ECAs are also relational agents, which build and maintain socioemotional relationships with people across multiple interactions. This study utilized a novel form of relational ECA, provided by Care Coach (care.coach, inc): an animated animal avatar on a tablet device, monitored and controlled by live health advocates. The ECA implemented algorithm-based clinical protocols for hospitalized older adults, such as reorienting patients to mitigate delirium risk, eliciting toileting needs to prevent falls, and engaging patients in social interaction to facilitate social engagement. Previous pilot studies of the Care Coach avatar have demonstrated the ECA?s usability and efficacy in home-dwelling older adults. Further study among hospitalized older adults in a larger experimental trial is needed to demonstrate its effectiveness. Objective: The aim of the study was to examine the effect of a human-in-the-loop, protocol-driven relational ECA on loneliness, depression, delirium, and falls among diverse hospitalized older adults. Methods: This was a clinical trial of 95 adults over the age of 65 years, hospitalized at an inner-city community hospital. Intervention participants received an avatar for the duration of their hospital stay; participants on a control unit received a daily 15-min visit from a nursing student. Measures of loneliness (3-item University of California, Los Angeles Loneliness Scale), depression (15-item Geriatric Depression Scale), and delirium (confusion assessment method) were administered upon study enrollment and before discharge. Results: Participants who received the avatar during hospitalization had lower frequency of delirium at discharge (P<.001), reported fewer symptoms of loneliness (P=.01), and experienced fewer falls than control participants. There were no significant differences in self-reported depressive symptoms. Conclusions: The study findings validate the use of human-in-the-loop, relational ECAs among diverse hospitalized older adults. UR - http://www.jmir.org/2019/10/e13440/ UR - http://dx.doi.org/10.2196/13440 UR - http://www.ncbi.nlm.nih.gov/pubmed/31625949 ID - info:doi/10.2196/13440 ER - TY - JOUR PY - 2019// TI - Roles of Health Literacy in Relation to Social Determinants of Health and Recommendations for Informatics-Based Interventions: Systematic Review JO - Online J Public Health Inform SP - e9998 VL - 11 IS - 2 UR - UR - http://dx.doi.org/10.5210/ojphi.v11i2.9998 ID - info:doi/10.5210/ojphi.v11i2.9998 ER - TY - JOUR AU - Tanana, J. Michael AU - Soma, S. Christina AU - Srikumar, Vivek AU - Atkins, C. David AU - Imel, E. Zac PY - 2019/07/15 TI - Development and Evaluation of ClientBot: Patient-Like Conversational Agent to Train Basic Counseling Skills JO - J Med Internet Res SP - e12529 VL - 21 IS - 7 KW - psychotherapy training KW - interactive learning KW - conversational agents KW - deep learning N2 - Background: Training therapists is both expensive and time-consuming. Degree?based training can require tens of thousands of dollars and hundreds of hours of expert instruction. Counseling skills practice often involves role-plays, standardized patients, or practice with real clients. Performance?based feedback is critical for skill development and expertise, but trainee therapists often receive minimal and subjective feedback, which is distal to their skill practice. Objective: In this study, we developed and evaluated a patient-like neural conversational agent, which provides real-time feedback to trainees via chat?based interaction. Methods: The text?based conversational agent was trained on an archive of 2354 psychotherapy transcripts and provided specific feedback on the use of basic interviewing and counseling skills (ie, open questions and reflections?summary statements of what a client has said). A total of 151 nontherapists were randomized to either (1) immediate feedback on their use of open questions and reflections during practice session with ClientBot or (2) initial education and encouragement on the skills. Results: Participants in the ClientBot condition used 91% (21.4/11.2) more reflections during practice with feedback (P<.001) and 76% (14.1/8) more reflections after feedback was removed (P<.001) relative to the control group. The treatment group used more open questions during training but not after feedback was removed, suggesting that certain skills may not improve with performance?based feedback. Finally, after feedback was removed, the ClientBot group used 31% (32.5/24.7) more listening skills overall (P<.001). Conclusions: This proof-of-concept study demonstrates that practice and feedback can improve trainee use of basic counseling skills. UR - https://www.jmir.org/2019/7/e12529/ UR - http://dx.doi.org/10.2196/12529 UR - http://www.ncbi.nlm.nih.gov/pubmed/31309929 ID - info:doi/10.2196/12529 ER - TY - JOUR AU - Loveys, Kate AU - Fricchione, Gregory AU - Kolappa, Kavitha AU - Sagar, Mark AU - Broadbent, Elizabeth PY - 2019/07/08 TI - Reducing Patient Loneliness With Artificial Agents: Design Insights From Evolutionary Neuropsychiatry JO - J Med Internet Res SP - e13664 VL - 21 IS - 7 KW - loneliness KW - neuropsychiatry KW - biological evolution KW - psychological bonding KW - interpersonal relations KW - artificial intelligence KW - social support KW - eHealth UR - https://www.jmir.org/2019/7/e13664/ UR - http://dx.doi.org/10.2196/13664 UR - http://www.ncbi.nlm.nih.gov/pubmed/31287067 ID - info:doi/10.2196/13664 ER - TY - JOUR AU - Easton, Katherine AU - Potter, Stephen AU - Bec, Remi AU - Bennion, Matthew AU - Christensen, Heidi AU - Grindell, Cheryl AU - Mirheidari, Bahman AU - Weich, Scott AU - de Witte, Luc AU - Wolstenholme, Daniel AU - Hawley, S. Mark PY - 2019/05/30 TI - A Virtual Agent to Support Individuals Living With Physical and Mental Comorbidities: Co-Design and Acceptability Testing JO - J Med Internet Res SP - e12996 VL - 21 IS - 5 KW - COPD KW - chronic obstructive pulmonary disease KW - mental health KW - comorbidity KW - chronic illness KW - self-management KW - artificial intelligence KW - virtual systems KW - computer-assisted therapy KW - chatbot KW - conversational agent N2 - Background: Individuals living with long-term physical health conditions frequently experience co-occurring mental health problems. This comorbidity has a significant impact on an individual?s levels of emotional distress, health outcomes, and associated health care utilization. As health care services struggle to meet demand and care increasingly moves to the community, digital tools are being promoted to support patients to self-manage their health. One such technology is the autonomous virtual agent (chatbot, conversational agent), which uses artificial intelligence (AI) to process the user?s written or spoken natural language and then to select or construct the corresponding appropriate responses. Objective: This study aimed to co-design the content, functionality, and interface modalities of an autonomous virtual agent to support self-management for patients with an exemplar long-term condition (LTC; chronic pulmonary obstructive disease [COPD]) and then to assess the acceptability and system content. Methods: We conducted 2 co-design workshops and a proof-of-concept implementation of an autonomous virtual agent with natural language processing capabilities. This implementation formed the basis for video-based scenario testing of acceptability with adults with a diagnosis of COPD and health professionals involved in their care. Results: Adults (n=6) with a diagnosis of COPD and health professionals (n=5) specified 4 priority self-management scenarios for which they would like to receive support: at the time of diagnosis (information provision), during acute exacerbations (crisis support), during periods of low mood (emotional support), and for general self-management (motivation). From the scenario testing, 12 additional adults with COPD felt the system to be both acceptable and engaging, particularly with regard to internet-of-things capabilities. They felt the system would be particularly useful for individuals living alone. Conclusions: Patients did not explicitly separate mental and physical health needs, although the content they developed for the virtual agent had a clear psychological approach. Supported self-management delivered via an autonomous virtual agent was acceptable to the participants. A co-design process has allowed the research team to identify key design principles, content, and functionality to underpin an autonomous agent for delivering self-management support to older adults living with COPD and potentially other LTCs. UR - http://www.jmir.org/2019/5/e12996/ UR - http://dx.doi.org/10.2196/12996 UR - http://www.ncbi.nlm.nih.gov/pubmed/31148545 ID - info:doi/10.2196/12996 ER - TY - JOUR AU - Robinson, Lee Nicole AU - Cottier, Vaughan Timothy AU - Kavanagh, John David PY - 2019/05/10 TI - Psychosocial Health Interventions by Social Robots: Systematic Review of Randomized Controlled Trials JO - J Med Internet Res SP - e13203 VL - 21 IS - 5 KW - social robot KW - healthcare KW - treatment KW - therapy KW - autism spectrum disorder KW - dementia N2 - Background: Social robots that can communicate and interact with people offer exciting opportunities for improved health care access and outcomes. However, evidence from randomized controlled trials (RCTs) on health or well-being outcomes has not yet been clearly synthesized across all health domains where social robots have been tested. Objective: This study aimed to undertake a systematic review examining current evidence from RCTs on the effects of psychosocial interventions by social robots on health or well-being. Methods: Medline, PsycInfo, ScienceDirect, Scopus, and Engineering Village searches across all years in the English language were conducted and supplemented by forward and backward searches. The included papers reported RCTs that assessed changes in health or well-being from interactions with a social robot across at least 2 measurement occasions. Results: Out of 408 extracted records, 27 trials met the inclusion criteria: 6 in child health or well-being, 9 in children with autism spectrum disorder, and 12 with older adults. No trials on adolescents, young adults, or other problem areas were identified, and no studies had interventions where robots spontaneously modified verbal responses based on speech by participants. Most trials were small (total N=5 to 415; median=34), only 6 (22%) reported any follow-up outcomes (2 to 12 weeks; median=3.5) and a single-blind assessment was reported in 8 (31%). More recent trials tended to have greater methodological quality. All papers reported some positive outcomes from robotic interventions, although most trials had some measures that showed no difference or favored alternate treatments. Conclusions: Controlled research on social robots is at an early stage, as is the current range of their applications to health care. Research on social robot interventions in clinical and health settings needs to transition from exploratory investigations to include large-scale controlled trials with sophisticated methodology, to increase confidence in their efficacy. UR - http://www.jmir.org/2019/5/e13203/ UR - http://dx.doi.org/10.2196/13203 UR - http://www.ncbi.nlm.nih.gov/pubmed/31094357 ID - info:doi/10.2196/13203 ER - TY - JOUR AU - Chaix, Benjamin AU - Bibault, Jean-Emmanuel AU - Pienkowski, Arthur AU - Delamon, Guillaume AU - Guillemassé, Arthur AU - Nectoux, Pierre AU - Brouard, Benoît PY - 2019/05/02 TI - When Chatbots Meet Patients: One-Year Prospective Study of Conversations Between Patients With Breast Cancer and a Chatbot JO - JMIR Cancer SP - e12856 VL - 5 IS - 1 KW - artificial intelligence KW - breast cancer KW - mobile phone KW - patient-reported outcomes KW - symptom management KW - chatbot KW - conversational agent N2 - Background: A chatbot is a software that interacts with users by simulating a human conversation through text or voice via smartphones or computers. It could be a solution to follow up with patients during their disease while saving time for health care providers. Objective: The aim of this study was to evaluate one year of conversations between patients with breast cancer and a chatbot. Methods: Wefight Inc designed a chatbot (Vik) to empower patients with breast cancer and their relatives. Vik responds to the fears and concerns of patients with breast cancer using personalized insights through text messages. We conducted a prospective study by analyzing the users? and patients? data, their usage duration, their interest in the various educational contents proposed, and their level of interactivity. Patients were women with breast cancer or under remission. Results: A total of 4737 patients were included. Results showed that an average of 132,970 messages exchanged per month was observed between patients and the chatbot, Vik. Thus, we calculated the average medication adherence rate over 4 weeks by using a prescription reminder function, and we showed that the more the patients used the chatbot, the more adherent they were. Patients regularly left positive comments and recommended Vik to their friends. The overall satisfaction was 93.95% (900/958). When asked what Vik meant to them and what Vik brought them, 88.00% (943/958) said that Vik provided them with support and helped them track their treatment effectively. Conclusions: We demonstrated that it is possible to obtain support through a chatbot since Vik improved the medication adherence rate of patients with breast cancer. UR - http://cancer.jmir.org/2019/1/e12856/ UR - http://dx.doi.org/10.2196/12856 UR - http://www.ncbi.nlm.nih.gov/pubmed/31045505 ID - info:doi/10.2196/12856 ER - TY - JOUR AU - Green, P. Eric AU - Pearson, Nicholas AU - Rajasekharan, Sathyanath AU - Rauws, Michiel AU - Joerin, Angela AU - Kwobah, Edith AU - Musyimi, Christine AU - Bhat, Chaya AU - Jones, M. Rachel AU - Lai, Yihuan PY - 2019/04/29 TI - Expanding Access to Depression Treatment in Kenya Through Automated Psychological Support: Protocol for a Single-Case Experimental Design Pilot Study JO - JMIR Res Protoc SP - e11800 VL - 8 IS - 4 KW - telemedicine KW - mental health KW - depression KW - artificial intelligence KW - Kenya KW - text messaging KW - chatbot KW - conversational agent N2 - Background: Depression during pregnancy and in the postpartum period is associated with a number of poor outcomes for women and their children. Although effective interventions exist for common mental disorders that occur during pregnancy and the postpartum period, most cases in low- and middle-income countries go untreated because of a lack of trained professionals. Task-sharing models such as the Thinking Healthy Program have shown great potential in feasibility and efficacy trials as a strategy for expanding access to treatment in low-resource settings, but there are significant barriers to scale-up. We are addressing this gap by adapting Thinking Healthy for automated delivery via a mobile phone. This new intervention, Healthy Moms, uses an existing artificial intelligence system called Tess (Zuri in Kenya) to drive conversations with users. Objective: The objective of this pilot study is to test the Healthy Moms perinatal depression intervention using a single-case experimental design with pregnant women and new mothers recruited from public hospitals outside of Nairobi, Kenya. Methods: We will invite patients to complete a brief, automated screening delivered via text messages to determine their eligibility. Enrolled participants will be randomized to a 1- or 2-week baseline period and then invited to begin using Zuri. Participants will be prompted to rate their mood via short message service every 3 days during the baseline and intervention periods. We will review system logs and conduct in-depth interviews with participants to study engagement with the intervention, feasibility, and acceptability. We will use visual inspection, in-depth interviews, and Bayesian estimation to generate preliminary data about the potential response to treatment. Results: Our team adapted the intervention content in April and May 2018 and completed an initial prepilot round of formative testing with 10 women from a private maternity hospital in May and June. In preparation for this pilot study, we used feedback from these users to revise the structure and content of the intervention. Recruitment for this protocol began in early 2019. Results are expected toward the end of 2019. Conclusions: The main limitation of this pilot study is that we will recruit women who live in urban and periurban centers in one part of Kenya. The results of this study may not generalize to the broader population of Kenyan women, but that is not an objective of this phase of work. Our primary objective is to gather preliminary data to know how to build and test a more robust service. We are working toward a larger study with a more diverse population. International Registered Report Identifier (IRRID): DERR1-10.2196/11800 UR - http://www.researchprotocols.org/2019/4/e11800/ UR - http://dx.doi.org/10.2196/11800 UR - http://www.ncbi.nlm.nih.gov/pubmed/31033448 ID - info:doi/10.2196/11800 ER - TY - JOUR AU - Park, SoHyun AU - Choi, Jeewon AU - Lee, Sungwoo AU - Oh, Changhoon AU - Kim, Changdai AU - La, Soohyun AU - Lee, Joonhwan AU - Suh, Bongwon PY - 2019/04/16 TI - Designing a Chatbot for a Brief Motivational Interview on Stress Management: Qualitative Case Study JO - J Med Internet Res SP - e12231 VL - 21 IS - 4 KW - motivational interviewing KW - mental health KW - conversational agents KW - stress management N2 - Background: In addition to addiction and substance abuse, motivational interviewing (MI) is increasingly being integrated in treating other clinical issues such as mental health problems. Most of the many technological adaptations of MI, however, have focused on delivering the action-oriented treatment, leaving its relational component unexplored or vaguely described. This study intended to design a conversational sequence that considers both technical and relational components of MI for a mental health concern. Objective: This case study aimed to design a conversational sequence for a brief motivational interview to be delivered by a Web-based text messaging application (chatbot) and to investigate its conversational experience with graduate students in their coping with stress. Methods: A brief conversational sequence was designed with varied combinations of MI skills to follow the 4 processes of MI. A Web-based text messaging application, Bonobot, was built as a research prototype to deliver the sequence in a conversation. A total of 30 full-time graduate students who self-reported stress with regard to their school life were recruited for a survey of demographic information and perceived stress and a semistructured interview. Interviews were transcribed verbatim and analyzed by Braun and Clarke?s thematic method. The themes that reflect the process of, impact of, and needs for the conversational experience are reported. Results: Participants had a high level of perceived stress (mean 22.5 [SD 5.0]). Our findings included the following themes: Evocative Questions and Clichéd Feedback; Self-Reflection and Potential Consolation; and Need for Information and Contextualized Feedback. Participants particularly favored the relay of evocative questions but were less satisfied with the agent-generated reflective and affirming feedback that filled in-between. Discussing the idea of change was a good means of reflecting on themselves, and some of Bonobot?s encouragements related to graduate school life were appreciated. Participants suggested the conversation provide informational support, as well as more contextualized feedback. Conclusions: A conversational sequence for a brief motivational interview was presented in this case study. Participant feedback suggests sequencing questions and MI-adherent statements can facilitate a conversation for stress management, which may encourage a chance of self-reflection. More diversified sequences, along with more contextualized feedback, should follow to offer a better conversational experience and to confirm any empirical effect. UR - https://www.jmir.org/2019/4/e12231/ UR - http://dx.doi.org/10.2196/12231 UR - http://www.ncbi.nlm.nih.gov/pubmed/30990463 ID - info:doi/10.2196/12231 ER - TY - JOUR AU - Palanica, Adam AU - Flaschner, Peter AU - Thommandram, Anirudh AU - Li, Michael AU - Fossat, Yan PY - 2019/04/05 TI - Physicians? Perceptions of Chatbots in Health Care: Cross-Sectional Web-Based Survey JO - J Med Internet Res SP - e12887 VL - 21 IS - 4 KW - physician satisfaction KW - health care KW - telemedicine KW - mobile health KW - health surveys N2 - Background: Many potential benefits for the uses of chatbots within the context of health care have been theorized, such as improved patient education and treatment compliance. However, little is known about the perspectives of practicing medical physicians on the use of chatbots in health care, even though these individuals are the traditional benchmark of proper patient care. Objective: This study aimed to investigate the perceptions of physicians regarding the use of health care chatbots, including their benefits, challenges, and risks to patients. Methods: A total of 100 practicing physicians across the United States completed a Web-based, self-report survey to examine their opinions of chatbot technology in health care. Descriptive statistics and frequencies were used to examine the characteristics of participants. Results: A wide variety of positive and negative perspectives were reported on the use of health care chatbots, including the importance to patients for managing their own health and the benefits on physical, psychological, and behavioral health outcomes. More consistent agreement occurred with regard to administrative benefits associated with chatbots; many physicians believed that chatbots would be most beneficial for scheduling doctor appointments (78%, 78/100), locating health clinics (76%, 76/100), or providing medication information (71%, 71/100). Conversely, many physicians believed that chatbots cannot effectively care for all of the patients? needs (76%, 76/100), cannot display human emotion (72%, 72/100), and cannot provide detailed diagnosis and treatment because of not knowing all of the personal factors associated with the patient (71%, 71/100). Many physicians also stated that health care chatbots could be a risk to patients if they self-diagnose too often (714%, 74/100) and do not accurately understand the diagnoses (74%, 74/100). Conclusions: Physicians believed in both costs and benefits associated with chatbots, depending on the logistics and specific roles of the technology. Chatbots may have a beneficial role to play in health care to support, motivate, and coach patients as well as for streamlining organizational tasks; in essence, chatbots could become a surrogate for nonmedical caregivers. However, concerns remain on the inability of chatbots to comprehend the emotional state of humans as well as in areas where expert medical knowledge and intelligence is required. UR - https://www.jmir.org/2019/4/e12887/ UR - http://dx.doi.org/10.2196/12887 UR - http://www.ncbi.nlm.nih.gov/pubmed/30950796 ID - info:doi/10.2196/12887 ER - TY - JOUR AU - Tielman, L. Myrthe AU - Neerincx, A. Mark AU - Brinkman, Willem-Paul PY - 2019/03/27 TI - Design and Evaluation of Personalized Motivational Messages by a Virtual Agent that Assists in Post-Traumatic Stress Disorder Therapy JO - J Med Internet Res SP - e9240 VL - 21 IS - 3 KW - mental health KW - motivation KW - trust KW - user-computer interface KW - PTSD KW - computer assisted therapy N2 - Background: Systems incorporating virtual agents can play a major role in electronic-mental (e-mental) health care, as barriers to care still prevent some patients from receiving the help they need. To properly assist the users of these systems, a virtual agent needs to promote motivation. This can be done by offering motivational messages. Objective: The objective of this study was two-fold. The first was to build a motivational message system for a virtual agent assisting in post-traumatic stress disorder (PTSD) therapy based on domain knowledge from experts. The second was to test the hypotheses that (1) computer-generated motivating messages influence users? motivation to continue with therapy, trust in a good therapy outcome, and the feeling of being heard by the agent and (2) personalized messages outperform generic messages on these factors. Methods: A system capable of generating motivational messages was built by analyzing expert (N=13) knowledge on what types of motivational statements to use in what situation. To test the 2 hypotheses, a Web-based study was performed (N=207). Participants were asked to imagine they were in a certain situation, specified by the progression of their symptoms and initial trust in a good therapy outcome. After this, they received a message from a virtual agent containing either personalized motivation as generated by the system, general motivation, or no motivational content. They were asked how this message changed their motivation to continue and trust in a good outcome as well as how much they felt they were being heard by the agent. Results: Overall, findings confirmed the first hypothesis, as well as the second hypothesis for the measure feeling of being heard by the agent. Personalization of the messages was also shown to be important in those situations where the symptoms were getting worse. In these situations, personalized messages outperformed general messages both in terms of motivation to continue and trust in a good therapy outcome. Conclusions: Expert input can successfully be used to develop a personalized motivational message system. Messages generated by such a system seem to improve people?s motivation and trust in PTSD therapy as well as the user?s feeling of being heard by a virtual agent. Given the importance of motivation, trust, and therapeutic alliance for successful therapy, we anticipate that the proposed system can improve adherence in e-mental therapy for PTSD and that it can provide a blueprint for the development of an adaptive system for persuasive messages based on expert input. UR - http://www.jmir.org/2019/3/e9240/ UR - http://dx.doi.org/10.2196/jmir.9240 UR - http://www.ncbi.nlm.nih.gov/pubmed/30916660 ID - info:doi/10.2196/jmir.9240 ER - TY - JOUR AU - Luk, Tsun Tzu AU - Wong, Wing Sze AU - Lee, Jae Jung AU - Chan, Siu-Chee Sophia AU - Lam, Hing Tai AU - Wang, Ping Man PY - 2019/01/31 TI - Exploring Community Smokers? Perspectives for Developing a Chat-Based Smoking Cessation Intervention Delivered Through Mobile Instant Messaging: Qualitative Study JO - JMIR Mhealth Uhealth SP - e11954 VL - 7 IS - 1 KW - chat intervention KW - instant messaging KW - mHealth KW - mobile phone KW - social media KW - smoking cessation KW - tobacco dependence KW - WhatsApp N2 - Background: Advances in mobile communication technologies provide a promising avenue for the delivery of tobacco dependence treatment. Although mobile instant messaging (IM) apps (eg, WhatsApp, Facebook messenger, and WeChat) are an inexpensive and widely used communication tool, evidence on its use for promoting health behavior, including smoking cessation, is scarce. Objective: This study aims to explore the perception of using mobile IM as a modality to deliver a proposed chat intervention for smoking cessation in community smokers in Hong Kong, where the proportion of smartphone use is among the highest in the world. Methods: We conducted 5 focus group, semistructured qualitative interviews on a purposive sample of 15 male and 6 female current cigarette smokers (age 23-68 years) recruited from the community in Hong Kong. All interviews were audiotaped and transcribed. Two investigators independently analyzed the transcripts using thematic analyses. Results: Participants considered mobile IM as a feasible and acceptable platform for the delivery of a supportive smoking cessation intervention. The ability to provide more personalized and adaptive behavioral support was regarded as the most valued utility of the IM?based intervention. Other perceived utilities included improved perceived psychosocial support and identification of motivator to quit. In addition, participants provided suggestions on the content and design of the intervention, which may improve the acceptability and usability of the IM?based intervention. These include avoiding health warning information, positive messaging, using former smokers as counselors, and adjusting the language style (spoken vs written) according to the recipients? preference. Conclusions: This qualitative study provides the first evidence that mobile IM may be an alternative mobile health platform for the delivery of a smoking cessation intervention. Furthermore, the findings inform the development of a chat-based, IM smoking cessation program being evaluated in a community trial. UR - https://mhealth.jmir.org/2019/1/e11954/ UR - http://dx.doi.org/10.2196/11954 UR - http://www.ncbi.nlm.nih.gov/pubmed/30702431 ID - info:doi/10.2196/11954 ER - TY - JOUR AU - Kramer, Jan-Niklas AU - Künzler, Florian AU - Mishra, Varun AU - Presset, Bastien AU - Kotz, David AU - Smith, Shawna AU - Scholz, Urte AU - Kowatsch, Tobias PY - 2019/01/31 TI - Investigating Intervention Components and Exploring States of Receptivity for a Smartphone App to Promote Physical Activity: Protocol of a Microrandomized Trial JO - JMIR Res Protoc SP - e11540 VL - 8 IS - 1 KW - physical activity KW - mHealth KW - walking KW - smartphone KW - incentives KW - self-regulation N2 - Background: Smartphones enable the implementation of just-in-time adaptive interventions (JITAIs) that tailor the delivery of health interventions over time to user- and time-varying context characteristics. Ideally, JITAIs include effective intervention components, and delivery tailoring is based on effective moderators of intervention effects. Using machine learning techniques to infer each user?s context from smartphone sensor data is a promising approach to further enhance tailoring. Objective: The primary objective of this study is to quantify main effects, interactions, and moderators of 3 intervention components of a smartphone-based intervention for physical activity. The secondary objective is the exploration of participants? states of receptivity, that is, situations in which participants are more likely to react to intervention notifications through collection of smartphone sensor data. Methods: In 2017, we developed the Assistant to Lift your Level of activitY (Ally), a chatbot-based mobile health intervention for increasing physical activity that utilizes incentives, planning, and self-monitoring prompts to help participants meet personalized step goals. We used a microrandomized trial design to meet the study objectives. Insurees of a large Swiss insurance company were invited to use the Ally app over a 12-day baseline and a 6-week intervention period. Upon enrollment, participants were randomly allocated to either a financial incentive, a charity incentive, or a no incentive condition. Over the course of the intervention period, participants were repeatedly randomized on a daily basis to either receive prompts that support self-monitoring or not and on a weekly basis to receive 1 of 2 planning interventions or no planning. Participants completed a Web-based questionnaire at baseline and postintervention follow-up. Results: Data collection was completed in January 2018. In total, 274 insurees (mean age 41.73 years; 57.7% [158/274] female) enrolled in the study and installed the Ally app on their smartphones. Main reasons for declining participation were having an incompatible smartphone (37/191, 19.4%) and collection of sensor data (35/191, 18.3%). Step data are available for 227 (82.8%, 227/274) participants, and smartphone sensor data are available for 247 (90.1%, 247/274) participants. Conclusions: This study describes the evidence-based development of a JITAI for increasing physical activity. If components prove to be efficacious, they will be included in a revised version of the app that offers scalable promotion of physical activity at low cost. Trial Registration: ClinicalTrials.gov NCT03384550; https://clinicaltrials.gov/ct2/show/NCT03384550 (Archived by WebCite at http://www.webcitation.org/74IgCiK3d) International Registered Report Identifier (IRRID): DERR1-10.2196/11540 UR - http://www.researchprotocols.org/2019/1/e11540/ UR - http://dx.doi.org/10.2196/11540 UR - http://www.ncbi.nlm.nih.gov/pubmed/30702430 ID - info:doi/10.2196/11540 ER - TY - JOUR AU - Fulmer, Russell AU - Joerin, Angela AU - Gentile, Breanna AU - Lakerink, Lysanne AU - Rauws, Michiel PY - 2018/12/13 TI - Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety: Randomized Controlled Trial JO - JMIR Ment Health SP - e64 VL - 5 IS - 4 KW - artificial intelligence KW - mental health services KW - depression KW - anxiety KW - students N2 - Background: Students in need of mental health care face many barriers including cost, location, availability, and stigma. Studies show that computer-assisted therapy and 1 conversational chatbot delivering cognitive behavioral therapy (CBT) offer a less-intensive and more cost-effective alternative for treating depression and anxiety. Although CBT is one of the most effective treatment methods, applying an integrative approach has been linked to equally effective posttreatment improvement. Integrative psychological artificial intelligence (AI) offers a scalable solution as the demand for affordable, convenient, lasting, and secure support grows. Objective: This study aimed to assess the feasibility and efficacy of using an integrative psychological AI, Tess, to reduce self-identified symptoms of depression and anxiety in college students. Methods: In this randomized controlled trial, 75 participants were recruited from 15 universities across the United States. All participants completed Web-based surveys, including the Patient Health Questionnaire (PHQ-9), Generalized Anxiety Disorder Scale (GAD-7), and Positive and Negative Affect Scale (PANAS) at baseline and 2 to 4 weeks later (T2). The 2 test groups consisted of 50 participants in total and were randomized to receive unlimited access to Tess for either 2 weeks (n=24) or 4 weeks (n=26). The information-only control group participants (n=24) received an electronic link to the National Institute of Mental Health?s (NIMH) eBook on depression among college students and were only granted access to Tess after completion of the study. Results: A sample of 74 participants completed this study with 0% attrition from the test group and less than 1% attrition from the control group (1/24). The average age of participants was 22.9 years, with 70% of participants being female (52/74), mostly Asian (37/74, 51%), and white (32/74, 41%). Group 1 received unlimited access to Tess, with daily check-ins for 2 weeks. Group 2 received unlimited access to Tess with biweekly check-ins for 4 weeks. The information-only control group was provided with an electronic link to the NIMH?s eBook. Multivariate analysis of covariance was conducted. We used an alpha level of .05 for all statistical tests. Results revealed a statistically significant difference between the control group and group 1, such that group 1 reported a significant reduction in symptoms of depression as measured by the PHQ-9 (P=.03), whereas those in the control group did not. A statistically significant difference was found between the control group and both test groups 1 and 2 for symptoms of anxiety as measured by the GAD-7. Group 1 (P=.045) and group 2 (P=.02) reported a significant reduction in symptoms of anxiety, whereas the control group did not. A statistically significant difference was found on the PANAS between the control group and group 1 (P=.03) and suggests that Tess did impact scores. Conclusions: This study offers evidence that AI can serve as a cost-effective and accessible therapeutic agent. Although not designed to appropriate the role of a trained therapist, integrative psychological AI emerges as a feasible option for delivering support. Trial Registration: International Standard Randomized Controlled Trial Number: ISRCTN61214172; https://doi.org/10.1186/ISRCTN61214172. UR - http://mental.jmir.org/2018/4/e64/ UR - http://dx.doi.org/10.2196/mental.9782 UR - http://www.ncbi.nlm.nih.gov/pubmed/30545815 ID - info:doi/10.2196/mental.9782 ER - TY - JOUR AU - Inkster, Becky AU - Sarda, Shubhankar AU - Subramanian, Vinod PY - 2018/11/23 TI - An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being: Real-World Data Evaluation Mixed-Methods Study JO - JMIR Mhealth Uhealth SP - e12106 VL - 6 IS - 11 KW - mental health KW - conversational agents KW - artificial intelligence KW - chatbots KW - coping skills KW - resilience, psychological KW - depression KW - mHealth KW - emotions KW - empathy N2 - Background: A World Health Organization 2017 report stated that major depression affects almost 5% of the human population. Major depression is associated with impaired psychosocial functioning and reduced quality of life. Challenges such as shortage of mental health personnel, long waiting times, perceived stigma, and lower government spends pose barriers to the alleviation of mental health problems. Face-to-face psychotherapy alone provides only point-in-time support and cannot scale quickly enough to address this growing global public health challenge. Artificial intelligence (AI)-enabled, empathetic, and evidence-driven conversational mobile app technologies could play an active role in filling this gap by increasing adoption and enabling reach. Although such a technology can help manage these barriers, they should never replace time with a health care professional for more severe mental health problems. However, app technologies could act as a supplementary or intermediate support system. Mobile mental well-being apps need to uphold privacy and foster both short- and long-term positive outcomes. Objective: This study aimed to present a preliminary real-world data evaluation of the effectiveness and engagement levels of an AI-enabled, empathetic, text-based conversational mobile mental well-being app, Wysa, on users with self-reported symptoms of depression. Methods: In the study, a group of anonymous global users were observed who voluntarily installed the Wysa app, engaged in text-based messaging, and self-reported symptoms of depression using the Patient Health Questionnaire-9. On the basis of the extent of app usage on and between 2 consecutive screening time points, 2 distinct groups of users (high users and low users) emerged. The study used mixed-methods approach to evaluate the impact and engagement levels among these users. The quantitative analysis measured the app impact by comparing the average improvement in symptoms of depression between high and low users. The qualitative analysis measured the app engagement and experience by analyzing in-app user feedback and evaluated the performance of a machine learning classifier to detect user objections during conversations. Results: The average mood improvement (ie, difference in pre- and post-self-reported depression scores) between the groups (ie, high vs low users; n=108 and n=21, respectively) revealed that the high users group had significantly higher average improvement (mean 5.84 [SD 6.66]) compared with the low users group (mean 3.52 [SD 6.15]); Mann-Whitney P=.03 and with a moderate effect size of 0.63. Moreover, 67.7% of user-provided feedback responses found the app experience helpful and encouraging. Conclusions: The real-world data evaluation findings on the effectiveness and engagement levels of Wysa app on users with self-reported symptoms of depression show promise. However, further work is required to validate these initial findings in much larger samples and across longer periods. UR - http://mhealth.jmir.org/2018/11/e12106/ UR - http://dx.doi.org/10.2196/12106 UR - http://www.ncbi.nlm.nih.gov/pubmed/30470676 ID - info:doi/10.2196/12106 ER - TY - JOUR AU - Bickmore, W. Timothy AU - Trinh, Ha AU - Olafsson, Stefan AU - O'Leary, K. Teresa AU - Asadi, Reza AU - Rickles, M. Nathaniel AU - Cruz, Ricardo PY - 2018/09/04 TI - Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant JO - J Med Internet Res SP - e11510 VL - 20 IS - 9 KW - conversational assistant KW - conversational interface KW - dialogue system KW - medical error KW - patient safety N2 - Background: Conversational assistants, such as Siri, Alexa, and Google Assistant, are ubiquitous and are beginning to be used as portals for medical services. However, the potential safety issues of using conversational assistants for medical information by patients and consumers are not understood. Objective: To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information. Methods: Participants were given medical problems to pose to Siri, Alexa, or Google Assistant, and asked to determine an action to take based on information from the system. Assignment of tasks and systems were randomized across participants, and participants queried the conversational assistants in their own words, making as many attempts as needed until they either reported an action to take or gave up. Participant-reported actions for each medical task were rated for patient harm using an Agency for Healthcare Research and Quality harm scale. Results: Fifty-four subjects completed the study with a mean age of 42 years (SD 18). Twenty-nine (54%) were female, 31 (57%) Caucasian, and 26 (50%) were college educated. Only 8 (15%) reported using a conversational assistant regularly, while 22 (41%) had never used one, and 24 (44%) had tried one ?a few times.? Forty-four (82%) used computers regularly. Subjects were only able to complete 168 (43%) of their 394 tasks. Of these, 49 (29%) reported actions that could have resulted in some degree of patient harm, including 27 (16%) that could have resulted in death. Conclusions: Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider. UR - http://www.jmir.org/2018/9/e11510/ UR - http://dx.doi.org/10.2196/11510 UR - http://www.ncbi.nlm.nih.gov/pubmed/30181110 ID - info:doi/10.2196/11510 ER - TY - JOUR AU - Suganuma, Shinichiro AU - Sakamoto, Daisuke AU - Shimoyama, Haruhiko PY - 2018/07/31 TI - An Embodied Conversational Agent for Unguided Internet-Based Cognitive Behavior Therapy in Preventative Mental Health: Feasibility and Acceptability Pilot Trial JO - JMIR Ment Health SP - e10454 VL - 5 IS - 3 KW - embodied conversational agent KW - cognitive behavioral therapy KW - psychological distress KW - mental well?being KW - artificial intelligence technology N2 - Background: Recent years have seen an increase in the use of internet-based cognitive behavioral therapy in the area of mental health. Although lower effectiveness and higher dropout rates of unguided than those of guided internet-based cognitive behavioral therapy remain critical issues, not incurring ongoing human clinical resources makes it highly advantageous. Objective: Current research in psychotherapy, which acknowledges the importance of therapeutic alliance, aims to evaluate the feasibility and acceptability, in terms of mental health, of an application that is embodied with a conversational agent. This application was enabled for use as an internet-based cognitive behavioral therapy preventative mental health measure. Methods: Analysis of the data from the 191 participants of the experimental group with a mean age of 38.07 (SD 10.75) years and the 263 participants of the control group with a mean age of 38.05 (SD 13.45) years using a 2-way factorial analysis of variance (group × time) was performed. Results: There was a significant main effect (P=.02) and interaction for time on the variable of positive mental health (P=.02), and for the treatment group, a significant simple main effect was also found (P=.002). In addition, there was a significant main effect (P=.02) and interaction for time on the variable of negative mental health (P=.005), and for the treatment group, a significant simple main effect was also found (P=.001). Conclusions: This research can be seen to represent a certain level of evidence for the mental health application developed herein, indicating empirically that internet-based cognitive behavioral therapy with the embodied conversational agent can be used in mental health care. In the pilot trial, given the issues related to feasibility and acceptability, it is necessary to pursue higher quality evidence while continuing to further improve the application, based on the findings of the current research. UR - http://mental.jmir.org/2018/3/e10454/ UR - http://dx.doi.org/10.2196/10454 UR - http://www.ncbi.nlm.nih.gov/pubmed/30064969 ID - info:doi/10.2196/10454 ER - TY - JOUR AU - Morris, R. Robert AU - Kouddous, Kareem AU - Kshirsagar, Rohan AU - Schueller, M. Stephen PY - 2018/06/26 TI - Towards an Artificially Empathic Conversational Agent for Mental Health Applications: System Design and User Perceptions JO - J Med Internet Res SP - e10148 VL - 20 IS - 6 KW - conversational agents KW - mental health KW - empathy KW - crowdsourcing KW - peer support N2 - Background: Conversational agents cannot yet express empathy in nuanced ways that account for the unique circumstances of the user. Agents that possess this faculty could be used to enhance digital mental health interventions. Objective: We sought to design a conversational agent that could express empathic support in ways that might approach, or even match, human capabilities. Another aim was to assess how users might appraise such a system. Methods: Our system used a corpus-based approach to simulate expressed empathy. Responses from an existing pool of online peer support data were repurposed by the agent and presented to the user. Information retrieval techniques and word embeddings were used to select historical responses that best matched a user?s concerns. We collected ratings from 37,169 users to evaluate the system. Additionally, we conducted a controlled experiment (N=1284) to test whether the alleged source of a response (human or machine) might change user perceptions. Results: The majority of responses created by the agent (2986/3770, 79.20%) were deemed acceptable by users. However, users significantly preferred the efforts of their peers (P<.001). This effect was maintained in a controlled study (P=.02), even when the only difference in responses was whether they were framed as coming from a human or a machine. Conclusions: Our system illustrates a novel way for machines to construct nuanced and personalized empathic utterances. However, the design had significant limitations and further research is needed to make this approach viable. Our controlled study suggests that even in ideal conditions, nonhuman agents may struggle to express empathy as well as humans. The ethical implications of empathic agents, as well as their potential iatrogenic effects, are also discussed. UR - http://www.jmir.org/2018/6/e10148/ UR - http://dx.doi.org/10.2196/10148 UR - http://www.ncbi.nlm.nih.gov/pubmed/29945856 ID - info:doi/10.2196/10148 ER - TY - JOUR AU - Martinez-Martin, Nicole AU - Kreitmair, Karola PY - 2018/04/23 TI - Ethical Issues for Direct-to-Consumer Digital Psychotherapy Apps: Addressing Accountability, Data Protection, and Consent JO - JMIR Ment Health SP - e32 VL - 5 IS - 2 KW - ethics KW - ethical issues KW - mental health KW - technology KW - telemedicine KW - mHealth KW - psychotherapy UR - http://mental.jmir.org/2018/2/e32/ UR - http://dx.doi.org/10.2196/mental.9423 UR - http://www.ncbi.nlm.nih.gov/pubmed/29685865 ID - info:doi/10.2196/mental.9423 ER - TY - JOUR AU - Howe, Esther AU - Pedrelli, Paola AU - Morris, Robert AU - Nyer, Maren AU - Mischoulon, David AU - Picard, Rosalind PY - 2017/09/22 TI - Feasibility of an Automated System Counselor for Survivors of Sexual Assault JO - iproc SP - e37 VL - 3 IS - 1 KW - CBT KW - web chat N2 - Background: Sexual assault (SA) is common and costly to individuals and society, and increases risk of mental health disorders. Stigma and cost of care discourage survivors from seeking help. Norms profiling survivors as heterosexual, cisgendered women dissuade LGBTQIA+ individuals and men from accessing care. Because individuals prefer disclosing sensitive information online rather than in-person, online systems?like instant messaging and chatbots?for counseling may bypass concerns about stigma. These systems? anonymity may increase disclosure and decrease impression management, the process by which individuals attempt to influence others? perceptions. Their low cost may expand reach of care. There are no known evidence-based chat platforms for SA survivors. Objective: To examine feasibility of a chat platform with peer and automated system (chatbot) counseling interfaces to provide cognitive reappraisals (a cognitive behavioral therapy technique) to survivors. Methods: Participants are English-speaking, US-based survivors, 18+ years old. Participants are told they will be randomized to chat with a peer or automated system counselor 5 times over 2 weeks. In reality, all participants chat with a peer counselor. Chats employ a modified-for-context evidence-based cognitive reappraisal script developed by Koko, a company offering support services for emotional distress via social networks. At baseline, participants indicate counselor type preference and complete a basic demographic form, the Brief Fear of Negative Evaluation Scale, and self-disclosure items from the International Personality Item Pool. After 5 chats, participants complete questions from the Client Satisfaction Questionnaire (CSQ), Self-Reported Attitudes Toward Agent, and the Working Alliance Inventory. Hypotheses: 1) Online chatting and automated systems will be acceptable and feasible means of delivering cognitive reappraisals to survivors. 2) High impression management (IM?25) and low self-disclosure (SD?45) will be associated with preference for an automated system. 3) IM and SD will separately moderate the relationship between counselor assignment and participant satisfaction. Results: Ten participants have completed the study. Recruitment is ongoing. We will enroll 50+ participants by 10/2017 and outline findings at the Connected Health Conference. To date, 70% of participants completed all chats within 24 hours of enrollment, and 60% indicated a pre-chat preference for an automated system, suggesting acceptability of the concept. The post-chat CSQ mean total score of 3.98 on a 5-point Likert scale (1=Poor; 5=Excellent) suggests platform acceptability. Of the 50% reporting high IM, 60% indicated preference for an automated system. Of the 30% reporting low SD, 33% reported preference for an automated system. At recruitment completion, ANOVA analyses will elucidate relationships between IM, SD, and counselor assignment. Correlation and linear regression analyses will show any moderating effect of IM and SD on the relationship between counselor assignment and participant satisfaction. Conclusions: Preliminary results suggest acceptability and feasibility of cognitive reappraisals via chat for survivors, and of the automated system counselor concept. Final results will explore relationships between SD, IM, counselor type, and participant satisfaction to inform the development of new platforms for survivors. UR - http://www.iproc.org/2017/1/e37/ UR - http://dx.doi.org/10.2196/iproc.8585 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/iproc.8585 ER - TY - JOUR AU - Hoermann, Simon AU - McCabe, L. Kathryn AU - Milne, N. David AU - Calvo, A. Rafael PY - 2017/7/21 TI - Application of Synchronous Text-Based Dialogue Systems in Mental Health Interventions: Systematic Review JO - J Med Internet Res SP - e267 VL - 19 IS - 8 KW - chat KW - dialog system KW - remote psychotherapy N2 - Background: Synchronous written conversations (or ?chats?) are becoming increasingly popular as Web-based mental health interventions. Therefore, it is of utmost importance to evaluate and summarize the quality of these interventions. Objective: The aim of this study was to review the current evidence for the feasibility and effectiveness of online one-on-one mental health interventions that use text-based synchronous chat. Methods: A systematic search was conducted of the databases relevant to this area of research (Medical Literature Analysis and Retrieval System Online [MEDLINE], PsycINFO, Central, Scopus, EMBASE, Web of Science, IEEE, and ACM). There were no specific selection criteria relating to the participant group. Studies were included if they reported interventions with individual text-based synchronous conversations (ie, chat or text messaging) and a psychological outcome measure. Results: A total of 24 articles were included in this review. Interventions included a wide range of mental health targets (eg, anxiety, distress, depression, eating disorders, and addiction) and intervention design. Overall, compared with the waitlist (WL) condition, studies showed significant and sustained improvements in mental health outcomes following synchronous text-based intervention, and post treatment improvement equivalent but not superior to treatment as usual (TAU) (eg, face-to-face and telephone counseling). Conclusions: Feasibility studies indicate substantial innovation in this area of mental health intervention with studies utilizing trained volunteers and chatbot technologies to deliver interventions. While studies of efficacy show positive post-intervention gains, further research is needed to determine whether time requirements for this mode of intervention are feasible in clinical practice. UR - http://www.jmir.org/2017/8/e267/ UR - http://dx.doi.org/10.2196/jmir.7023 UR - http://www.ncbi.nlm.nih.gov/pubmed/28784594 ID - info:doi/10.2196/jmir.7023 ER - TY - JOUR AU - Fitzpatrick, Kara Kathleen AU - Darcy, Alison AU - Vierhile, Molly PY - 2017/06/06 TI - Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial JO - JMIR Ment Health SP - e19 VL - 4 IS - 2 KW - conversational agents KW - mobile mental health KW - mental health KW - chatbots KW - depression KW - anxiety KW - college students KW - digital health N2 - Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time. Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and depression. Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental Health ebook, ?Depression in College Students,? as an information-only control group (n=36). All participants completed Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7), and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2). Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23) times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers, participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants? comments suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional therapy. Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT. UR - http://mental.jmir.org/2017/2/e19/ UR - http://dx.doi.org/10.2196/mental.7785 UR - http://www.ncbi.nlm.nih.gov/pubmed/28588005 ID - info:doi/10.2196/mental.7785 ER - TY - JOUR AU - Provoost, Simon AU - Lau, Ming Ho AU - Ruwaard, Jeroen AU - Riper, Heleen PY - 2017/05/09 TI - Embodied Conversational Agents in Clinical Psychology: A Scoping Review JO - J Med Internet Res SP - e151 VL - 19 IS - 5 KW - eHealth KW - review KW - embodied conversational agent KW - human computer interaction KW - clinical psychology KW - mental disorders KW - intelligent agent KW - health behavior N2 - Background: Embodied conversational agents (ECAs) are computer-generated characters that simulate key properties of human face-to-face conversation, such as verbal and nonverbal behavior. In Internet-based eHealth interventions, ECAs may be used for the delivery of automated human support factors. Objective: We aim to provide an overview of the technological and clinical possibilities, as well as the evidence base for ECA applications in clinical psychology, to inform health professionals about the activity in this field of research. Methods: Given the large variety of applied methodologies, types of applications, and scientific disciplines involved in ECA research, we conducted a systematic scoping review. Scoping reviews aim to map key concepts and types of evidence underlying an area of research, and answer less-specific questions than traditional systematic reviews. Systematic searches for ECA applications in the treatment of mood, anxiety, psychotic, autism spectrum, and substance use disorders were conducted in databases in the fields of psychology and computer science, as well as in interdisciplinary databases. Studies were included if they conveyed primary research findings on an ECA application that targeted one of the disorders. We mapped each study?s background information, how the different disorders were addressed, how ECAs and users could interact with one another, methodological aspects, and the study?s aims and outcomes. Results: This study included N=54 publications (N=49 studies). More than half of the studies (n=26) focused on autism treatment, and ECAs were used most often for social skills training (n=23). Applications ranged from simple reinforcement of social behaviors through emotional expressions to sophisticated multimodal conversational systems. Most applications (n=43) were still in the development and piloting phase, that is, not yet ready for routine practice evaluation or application. Few studies conducted controlled research into clinical effects of ECAs, such as a reduction in symptom severity. Conclusions: ECAs for mental disorders are emerging. State-of-the-art techniques, involving, for example, communication through natural language or nonverbal behavior, are increasingly being considered and adopted for psychotherapeutic interventions in ECA research with promising results. However, evidence on their clinical application remains scarce. At present, their value to clinical practice lies mostly in the experimental determination of critical human support factors. In the context of using ECAs as an adjunct to existing interventions with the aim of supporting users, important questions remain with regard to the personalization of ECAs? interaction with users, and the optimal timing and manner of providing support. To increase the evidence base with regard to Internet interventions, we propose an additional focus on low-tech ECA solutions that can be rapidly developed, tested, and applied in routine practice. UR - http://www.jmir.org/2017/5/e151/ UR - http://dx.doi.org/10.2196/jmir.6553 UR - http://www.ncbi.nlm.nih.gov/pubmed/28487267 ID - info:doi/10.2196/jmir.6553 ER - TY - JOUR AU - Gardiner, Paula AU - Negash, Lily N. AU - Shamekhi, Ameneh AU - Bickmore, Timothy AU - Gergen-Barnett, Katherine AU - Lestoquoy, Sophia Anna AU - Stillman, Sarah PY - 2016/12/09 TI - Utilization of an Embodied Conversational Agent in an Integrative Medical Group Visit for Patients with Chronic Pain and Depression JO - iproc SP - e6 VL - 2 IS - 1 KW - integrative medicine KW - embodied conversational agent KW - group visits N2 - Background: This abstract will report on the feasibility of introducing an innovative eHealth technology called an Embodied Conversational Agent (ECA) into a diverse patient population with chronic pain and depression. Objective: The Integrative Medical Group Visit (IMGV) is a 9-week curriculum designed for patients with chronic pain and depression. The IMGV consists of 9 weekly group medical visits during which patients learn self-management for chronic pain and depression. Tablet computers with an ECA are given to each participant to reinforce the curriculum and self-care practices. The ECA reviews material covered in IMGV sessions and allows for participants to set healthy nutritional, exercise, and mindfulness goals. This clinical trial is ongoing across 3 sites in Boston, MA. Methods: Patients were recruited from Boston Medical Center, Codman Square Community Health Center, and DotHouse Health. Demographic characteristics collected include age, gender, race, ethnicity, and sexual orientation. Patients in the intervention were given a Dell tablet with an ECA for the duration of the study and were encouraged to interact with the ECA on a regular basis. The ECA reviewed material covered during group medical visits and served as a tool for participants to practice self-management and stress reduction techniques. Usage data were collected from the tablets at 9-weeks and at 21-weeks post enrollment. Results: In total, 75 patients were enrolled in the intervention. The majority of patients were female (83%), 60% identified as black/African American, and nearly 90% identified as non-Hispanic. The mean age in this sample was 50 years old. Approximately half of patients reported regular computer use prior to the study (56%). For this abstract, usage data and pain and depression outcomes are reported on. Patterns of utilization will be assessed from tablet usage data. This data will be used to assess potential associations between demographic data, amount of time spent using ECA, and content delivered by ECA. Conclusions: ECAs may represent one strategy to encourage patient use of self-management for pain and depression. ClinicalTrial: Clinicaltrials.gov NCT02262377; https://clinicaltrials.gov/ct2/show/NCT02262377 (Archived by WebCite at http://www.webcitation.org/6maRgLIT7). UR - http://www.iproc.org/2016/1/e6/ UR - http://dx.doi.org/10.2196/iproc.6099 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/iproc.6099 ER -